PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1114242)

Clipboard (0)
None

Related Articles

1.  AthaMap web tools for the analysis and identification of co-regulated genes 
Nucleic Acids Research  2006;35(Database issue):D857-D862.
The AthaMap database generates a map of cis-regulatory elements for the whole Arabidopsis thaliana genome. This database has been extended by new tools to identify common cis-regulatory elements in specific regions of user-provided gene sets. A resulting table displays all cis-regulatory elements annotated in AthaMap including positional information relative to the respective gene. Further tables show overviews with the number of individual transcription factor binding sites (TFBS) present and TFBS common to the whole set of genes. Over represented cis-elements are easily identified. These features were used to detect specific enrichment of drought-responsive elements in cold-induced genes. For identification of co-regulated genes, the output table of the colocalization function was extended to show the closest genes and their relative distances to the colocalizing TFBS. Gene sets determined by this function can be used for a co-regulation analysis in microarray gene expression databases such as Genevestigator or PathoPlant. Additional improvements of AthaMap include display of the gene structure in the sequence window and a significant data increase. AthaMap is freely available at .
doi:10.1093/nar/gkl1006
PMCID: PMC1761422  PMID: 17148485
2.  AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana 
Nucleic Acids Research  2005;33(Web Server issue):W397-W402.
The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 × 106 putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 × 105 combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at .
doi:10.1093/nar/gki395
PMCID: PMC1160156  PMID: 15980498
3.  PathoPlant®: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses 
Nucleic Acids Research  2006;35(Database issue):D841-D845.
Plants react to pathogen attack by expressing specific proteins directed toward the infecting pathogens. This involves the transcriptional activation of specific gene sets. PathoPlant®, a database on plant–pathogen interactions and signal transduction reactions, has now been complemented by microarray gene expression data from Arabidopsis thaliana subjected to pathogen infection and elicitor treatment. New web tools enable identification of plant genes regulated by specific stimuli. Sets of genes co-regulated by multiple stimuli can be displayed as well. A user-friendly web interface was created for the submission of gene sets to be analyzed. This results in a table, listing the stimuli that act either inducing or repressing on the respective genes. The search can be restricted to certain induction factors to identify, e.g. strongly up- or down-regulated genes. Up to three stimuli can be combined with the option of induction factor restriction to determine similarly regulated genes. To identify common cis-regulatory elements in co-regulated genes, a resulting gene list can directly be exported to the AthaMap database for analysis. PathoPlant is freely accessible at .
doi:10.1093/nar/gkl835
PMCID: PMC1669748  PMID: 17099232
4.  ‘MicroRNA Targets’, a new AthaMap web-tool for genome-wide identification of miRNA targets in Arabidopsis thaliana 
BioData Mining  2012;5:7.
Background
The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation.
Methods
To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome.
Results
Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, ‘MicroRNA Targets’, was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new ‘Small RNA Targets’ web-tool.
Conclusions
The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL: http://www.athamap.de/
doi:10.1186/1756-0381-5-7
PMCID: PMC3410767  PMID: 22800758
Arabidopsis thaliana; AthaMap; MicroRNAs; Small RNAs; Post-transcriptional regulation
5.  Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes 
BMC Genomics  2014;15:317.
Background
Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes.
Results
Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions.
Conclusions
The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.
doi:10.1186/1471-2164-15-317
PMCID: PMC4234446  PMID: 24773781
Databases; Arabidopsis thaliana; Physcomitrella patens; Yeast one-hybrid; Microarray; Transcription factor; cis-element
6.  AthaMap, integrating transcriptional and post-transcriptional data 
Nucleic Acids Research  2008;37(Database issue):D983-D986.
The AthaMap database generates a map of predicted transcription factor binding sites (TFBS) for the whole Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional regulation. A total of 403 173 genomic positions of small RNAs have been mapped in the A. thaliana genome. These identify 5772 putative post-transcriptionally regulated target genes. AthaMap tools have been modified to improve the identification of common TFBS in co-regulated genes by subtracting post-transcriptionally regulated genes from such analyses. Furthermore, AthaMap was updated to the TAIR7 genome annotation, a graphic display of gene analysis results was implemented, and the TFBS data content was increased. AthaMap is freely available at http://www.athamap.de/.
doi:10.1093/nar/gkn709
PMCID: PMC2686474  PMID: 18842622
7.  AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome 
Nucleic Acids Research  2004;32(Database issue):D368-D372.
Gene expression is controlled mainly by the binding of transcription factors to regulatory sequences. To generate a genomic map for regulatory sequences, the Arabidopsis thaliana genome was screened for putative transcription factor binding sites. Using publicly available data from the TRANSFAC database and from publications, alignment matrices for 23 transcription factors of 13 different factor families were used with the pattern search program Patser to determine the genomic positions of more than 2.4 × 106 putative binding sites. Due to the dense clustering of genes and the observation that regulatory sequences are not restricted to upstream regions, the prediction of binding sites was performed for the whole genome. The genomic positions and the underlying data were imported into the newly developed AthaMap database. This data can be accessed by positional information or the Arabidopsis Genome Initiative identification number. Putative binding sites are displayed in the defined region. Data on the matrices used and on the thresholds applied in these screens are given in the database. Considering the high density of sites it will be a valuable resource for generating models on gene expression regulation. The data are available at http://www.athamap.de.
doi:10.1093/nar/gkh017
PMCID: PMC308752  PMID: 14681436
8.  AthaMap-assisted transcription factor target gene identification in Arabidopsis thaliana 
The AthaMap database generates a map of potential transcription factor binding sites (TFBS) and small RNA target sites in the Arabidopsis thaliana genome. The database contains sites for 115 different transcription factors (TFs). TFBS were identified with positional weight matrices (PWMs) or with single binding sites. With the new web tool ‘Gene Identification’, it is possible to identify potential target genes for selected TFs. For these analyses, the user can define a region of interest of up to 6000 bp in all annotated genes. For TFBS determined with PWMs, the search can be restricted to high-quality TFBS. The results are displayed in tables that identify the gene, position of the TFBS and, if applicable, individual score of the TFBS. In addition, data files can be downloaded that harbour positional information of TFBS of all TFs in a region between −2000 and +2000 bp relative to the transcription or translation start site. Also, data content of AthaMap was increased and the database was updated to the TAIR8 genome release.
Database URL: http://www.athamap.de/gene_ident.php
doi:10.1093/database/baq034
PMCID: PMC3011983  PMID: 21177332
9.  Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana 
A protein interactome focused towards cell proliferation was mapped comprising 857 interactions among 393 proteins, leading to many new insights in plant cell cycle regulation.A comprehensive view on heterodimeric cyclin-dependent kinase (CDK)/cyclin complexes in plants is obtained, in relation with their regulators.Over 100 new candidate cell cycle proteins were predicted.
The basic underlying mechanisms that govern the cell cycle are conserved among all eukaryotes. Peculiar for plants, however, is that their genome contains a collection of cell cycle regulatory genes that is intriguingly large (Vandepoele et al, 2002; Menges et al, 2005) compared to other eukaryotes. Arabidopsis thaliana (Arabidopsis) encodes 71 genes in five regulatory classes versus only 15 in yeast and 23 in human.
Despite the discovery of numerous cell cycle genes, little is known about the protein complex machinery that steers plant cell division. Therefore, we applied tandem affinity purification (TAP) approach coupled with mass spectrometry (MS) on Arabidopsis cell suspension cultures to isolate and analyze protein complexes involved in the cell cycle. This approach allowed us to successfully map a first draft of the basic cell cycle complex machinery of Arabidopsis, providing many new insights into plant cell division.
To map the interactome, we relied on a streamlined platform comprising generic Gateway-based vectors with high cloning flexibility, the fast generation of transgenic suspension cultures, TAP adapted for plant cells, and matrix-assisted laser desorption ionization (MALDI) tandem-MS for the identification of purified proteins (Van Leene et al, 2007, 2008Van Leene et al, 2007, 2008). Complexes for 102 cell cycle proteins were analyzed using this approach, leading to a non-redundant data set of 857 interactions among 393 proteins (Figure 1A). Two subspaces were identified in this data set, domain I1, containing interactions confirmed in at least two independent experimental repeats or in the reciprocal purification experiment, and domain I2 consisting of uniquely observed interactions.
Several observations underlined the quality of both domains. All tested reverse purifications found the original interaction, and 150 known or predicted interactions were confirmed, meaning that also a huge stack of new interactions was revealed. An in-depth computational analysis revealed enrichment for many cell cycle-related features among the proteins of the network (Figure 1B), and many protein pairs were coregulated at the transcriptional level (Figure 1C). Through integration of known cell cycle-related features, more than 100 new candidate cell cycle proteins were predicted (Figure 1D). Besides common qualities of both interactome domains, their real significance appeared through mutual differences exposing two subspaces in the cell cycle interactome: a central regulatory network of stable complexes that are repeatedly isolated and represent core regulatory units, and a peripheral network comprising transient interactions identified less frequently, which are involved in other aspects of the process, such as crosstalk between core complexes or connections with other pathways. To evaluate the biological relevance of the cell cycle interactome in plants, we validated interactions from both domains by a transient split-luciferase assay in Arabidopsis plants (Marion et al, 2008), further sustaining the hypothesis-generating power of the data set to understand plant growth.
With respect to insights into the cell cycle physiology, the interactome was subdivided according to the functional classes of the baits and core protein complexes were extracted, covering cyclin-dependent kinase (CDK)/cyclin core complexes together with their positive and negative regulation networks, DNA replication complexes, the anaphase-promoting complex, and spindle checkpoint complexes. The data imply that mitotic A- and B-type cyclins exclusively form heterodimeric complexes with the plant-specific B-type CDKs and not with CDKA;1, whereas D-type cyclins seem to associate with CDKA;1. Besides the extraction of complexes previously shown in other organisms, our data also suggested many new functional links; for example, the link coupling cell division with the regulation of transcript splicing. The association of negative regulators of CDK/cyclin complexes with transcription factors suggests that their role in reallocation is not solely targeted to CDK/cyclin complexes. New members of the Siamese-related inhibitory proteins were identified, and for the first time potential inhibitors of plant-specific mitotic B-type CDKs have been found in plants. New evidence that the E2F–DP–RBR network is not only active at G1-to-S, but also at the G2-to-M transition is provided and many complexes involved in DNA replication or repair were isolated. For the first time, a plant APC has been isolated biochemically, identifying three potential new plant-specific APC interactors, and finally, complexes involved in the spindle checkpoint were isolated mapping many new but specific interactions.
Finally, to get a general view on the complex machinery, modules of interacting cyclins and core cell cycle regulators were ranked along the cell cycle phases according to the transcript expression peak of the cyclins, showing an assorted set of CDK–cyclin complexes with high regulatory differentiation (Figure 4). Even within the same subfamily (e.g. cyclin A3, B1, B2, D3, and D4), cyclins differ not only in their functional time frame but also in the type and number of CDKs, inhibitors, and scaffolding proteins they bind, further indicating their functional diversification. According to our interaction data, at least 92 different variants of CDK–cyclin complexes are found in Arabidopsis.
In conclusion, these results reflect how several rounds of gene duplication (Sterck et al, 2007) led to the evolution of a large set of cyclin paralogs and a myriad of regulators, resulting in a significant jump in the complexity of the cell cycle machinery that could accommodate unique plant-specific features such as an indeterminate mode of postembryonic development. Through their extensive regulation and connection with a myriad of up- and downstream pathways, the core cell cycle complexes might offer the plant a flexible toolkit to fine-tune cell proliferation in response to an ever-changing environment.
Cell proliferation is the main driving force for plant growth. Although genome sequence analysis revealed a high number of cell cycle genes in plants, little is known about the molecular complexes steering cell division. In a targeted proteomics approach, we mapped the core complex machinery at the heart of the Arabidopsis thaliana cell cycle control. Besides a central regulatory network of core complexes, we distinguished a peripheral network that links the core machinery to up- and downstream pathways. Over 100 new candidate cell cycle proteins were predicted and an in-depth biological interpretation demonstrated the hypothesis-generating power of the interaction data. The data set provided a comprehensive view on heterodimeric cyclin-dependent kinase (CDK)–cyclin complexes in plants. For the first time, inhibitory proteins of plant-specific B-type CDKs were discovered and the anaphase-promoting complex was characterized and extended. Important conclusions were that mitotic A- and B-type cyclins form complexes with the plant-specific B-type CDKs and not with CDKA;1, and that D-type cyclins and S-phase-specific A-type cyclins seem to be associated exclusively with CDKA;1. Furthermore, we could show that plants have evolved a combinatorial toolkit consisting of at least 92 different CDK–cyclin complex variants, which strongly underscores the functional diversification among the large family of cyclins and reflects the pivotal role of cell cycle regulation in the developmental plasticity of plants.
doi:10.1038/msb.2010.53
PMCID: PMC2950081  PMID: 20706207
Arabidopsis thaliana; cell cycle; interactome; protein complex; protein interactions
10.  AtPAN: an integrated system for reconstructing transcriptional regulatory networks in Arabidopsis thaliana 
BMC Genomics  2012;13:85.
Background
Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks.
Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future.
Conclusions
The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at http://AtPAN.itps.ncku.edu.tw/.
doi:10.1186/1471-2164-13-85
PMCID: PMC3314555  PMID: 22397531
11.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors 
BMC Bioinformatics  2003;4:25.
Background
The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).
Results
AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes.
Conclusion
AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from .
doi:10.1186/1471-2105-4-25
PMCID: PMC166152  PMID: 12820902
12.  ‘In silico expression analysis’, a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences 
Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana.
Database URL: http://www.pathoplant.de/expression_analysis.php.
doi:10.1093/database/bau030
PMCID: PMC3983564  PMID: 24727366
13.  MEME Suite: tools for motif discovery and searching 
Nucleic Acids Research  2009;37(Web Server issue):W202-W208.
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.
doi:10.1093/nar/gkp335
PMCID: PMC2703892  PMID: 19458158
14.  Gene coexpression clusters and putative regulatory elements underlying seed storage reserve accumulation in Arabidopsis 
BMC Genomics  2011;12:286.
Background
In Arabidopsis, a large number of genes involved in the accumulation of seed storage reserves during seed development have been characterized, but the relationship of gene expression and regulation underlying this physiological process remains poorly understood. A more holistic view of this molecular interplay will help in the further study of the regulatory mechanisms controlling seed storage compound accumulation.
Results
We identified gene coexpression networks in the transcriptome of developing Arabidopsis (Arabidopsis thaliana) seeds from the globular to mature embryo stages by analyzing publicly accessible microarray datasets. Genes encoding the known enzymes in the fatty acid biosynthesis pathway were found in one coexpression subnetwork (or cluster), while genes encoding oleosins and seed storage proteins were identified in another subnetwork with a distinct expression profile. In the triacylglycerol assembly pathway, only the genes encoding diacylglycerol acyltransferase 1 (DGAT1) and a putative cytosolic "type 3" DGAT exhibited a similar expression pattern with genes encoding oleosins. We also detected a large number of putative cis-acting regulatory elements in the promoter regions of these genes, and promoter motifs for LEC1 (LEAFY COTYLEDON 1), DOF (DNA-binding-with-One-Finger), GATA, and MYB transcription factors (TF), as well as SORLIP5 (Sequences Over-Represented in Light-Induced Promoters 5), are overrepresented in the promoter regions of fatty acid biosynthetic genes. The conserved CCAAT motifs for B3-domain TFs and binding sites for bZIP (basic-leucine zipper) TFs are enriched in the promoters of genes encoding oleosins and seed storage proteins.
Conclusions
Genes involved in the accumulation of seed storage reserves are expressed in distinct patterns and regulated by different TFs. The gene coexpression clusters and putative regulatory elements presented here provide a useful resource for further experimental characterization of protein interactions and regulatory networks in this process.
doi:10.1186/1471-2164-12-286
PMCID: PMC3126783  PMID: 21635767
15.  TOBFAC: the database of tobacco transcription factors 
BMC Bioinformatics  2008;9:53.
Background
Regulation of gene expression at the level of transcription is a major control point in many biological processes. Transcription factors (TFs) can activate and/or repress the transcriptional rate of target genes and vascular plant genomes devote approximately 7% of their coding capacity to TFs. Global analysis of TFs has only been performed for three complete higher plant genomes – Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa) and rice (Oryza sativa). Presently, no large-scale analysis of TFs has been made from a member of the Solanaceae, one of the most important families of vascular plants. To fill this void, we have analysed tobacco (Nicotiana tabacum) TFs using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering of the tobacco genome. An analytical pipeline was developed to isolate TF sequences from the GSR data set. This involved multiple (typically 10–15) independent searches with different versions of the TF family-defining domain(s) (normally the DNA-binding domain) followed by assembly into contigs and verification. Our analysis revealed that tobacco contains a minimum of 2,513 TFs representing all of the 64 well-characterised plant TF families. The number of TFs in tobacco is higher than previously reported for Arabidopsis and rice.
Results
TOBFAC: the database of tobacco transcription factors, is an integrative database that provides a portal to sequence and phylogeny data for the identified TFs, together with a large quantity of other data concerning TFs in tobacco. The database contains an individual page dedicated to each of the 64 TF families. These contain background information, domain architecture via Pfam links, a list of all sequences and an assessment of the minimum number of TFs in this family in tobacco. Downloadable phylogenetic trees of the major families are provided along with detailed information on the bioinformatic pipeline that was used to find all family members. TOBFAC also contains EST data, a list of published tobacco TFs and a list of papers concerning tobacco TFs. The sequences and annotation data are stored in relational tables using a PostgrelSQL relational database management system. The data processing and analysis pipelines used the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The computationally intensive data processing and analysis pipelines were run on an Apple XServe cluster with more than 20 nodes.
Conclusion
TOBFAC is an expandable knowledgebase of tobacco TFs with data currently available for over 2,513 TFs from 64 gene families. TOBFAC integrates available sequence information, phylogenetic analysis, and EST data with published reports on tobacco TF function. The database provides a major resource for the study of gene expression in tobacco and the Solanaceae and helps to fill a current gap in studies of TF families across the plant kingdom. TOBFAC is publicly accessible at .
doi:10.1186/1471-2105-9-53
PMCID: PMC2246155  PMID: 18221524
16.  TTS Mapping: integrative WEB tool for analysis of triplex formation target DNA Sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome 
BMC Genomics  2009;10(Suppl 3):S9.
Background
DNA triplexes can naturally occur, co-localize and interact with many other regulatory DNA elements (e.g. G-quadruplex (G4) DNA motifs), specific DNA-binding proteins (e.g. transcription factors (TFs)), and micro-RNA (miRNA) precursors. Specific genome localizations of triplex target DNA sites (TTSs) may cause abnormalities in a double-helix DNA structure and can be directly involved in some human diseases. However, genome localization of specific TTSs, their interconnection with regulatory DNA elements and physiological roles in a cell are poor defined. Therefore, it is important to identify comprehensive and reliable catalogue of specific potential TTSs (pTTSs) and their co-localization patterns with other regulatory DNA elements in the human genome.
Results
"TTS mapping" database is a web-based search engine developed here, which is aimed to find and annotate pTTSs within a region of interest of the human genome. The engine provides descriptive statistics of pTTSs in a given region and its sequence context. Different annotation tracks of TTS-overlapping gene region(s), G4 motifs, CpG Island, miRNA precursors, miRNA targets, transcription factor binding sites (TFBSs), Single Nucleotide Polymorphisms (SNPs), small nucleolar RNAs (snoRNA), and repeat elements are also mapped based onto a sequence location provided by UCSC genome browser, G4 database http://www.quadruplex.org and several other datasets. The results pages provide links to UCSC genome browser annotation tracks and relative DBs. BLASTN program was included to check the uniqueness of a given pTTS in the human genome. Recombination- and mutation-prone genes (e.g. EVI-1, MYC) were found to be significantly enriched by TTSs and multiple co-occurring with our regulatory DNA elements. TTS mapping reveals that a high-complementary and evolutionarily conserved polypurine and polypyrimidine DNA sequence pair linked by a non-conserved short DNA sequence can form miR-483 transcribed from intron 2 of IGF2 gene and bound double-strand nucleic acid TTSs forming natural triplex structures.
Conclusion
TTS mapping provides comprehensive visual and analytical tools to help users to find pTTSs, G-quadruplets and other regulatory DNA elements in various genome regions. TTS Mapping not only provides sequence visualization and statistical information, but also integrates knowledge about co-localization TTS with various DNA elements and facilitates that data analysis. In particular, TTS Mapping reveals complex structural-functional regulatory module of gene IGF2 including TF MZF1 binding site and ncRNA precursor mir-483 formed by the high-complementary and evolutionarily conserved polypurine- and polypyrimidine-rich DNA pair. Such ncRNAs capable of forming helical triplex structures with a polypurine strand of a nucleic acid duplexes (DNA or RNA) via Hoogsteen or reverse Hoogsteen hydrogen bonds. Our web tool could be used to discover biologically meaningful genome modules and to optimize experimental design of anti-gene treatment.
doi:10.1186/1471-2164-10-S3-S9
PMCID: PMC2788396  PMID: 19958507
17.  PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups 
BMC Genomics  2008;9:561.
Background
The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands) in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs) is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required.
Results
This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0), AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs) within a defined distance (20 bp to 200 bp) to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented.
Conclusion
In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at .
doi:10.1186/1471-2164-9-561
PMCID: PMC2633311  PMID: 19036138
18.  Boolean modeling of transcriptome data reveals novel modes of heterotrimeric G-protein action 
Classical mechanisms of heterotrimeric G-protein signaling are observed to function in regulation of the transcriptome. Conversely, many theoretical regulatory modes of the G-protein are not manifested in the transcriptomes we investigate.A new mechanism of G-protein signaling is revealed, in which the β subunit regulates gene expression identically in the presence or absence of the α subunit.We find evidence of cross-talk between G-protein-mediated and hormone-mediated transcriptional regulation.We find evidence of system specificity in G-protein signaling.
Heterotrimeric G-proteins, composed of α, β, and γ subunits, participate in a wide range of signaling pathways in eukaryotes (Morris and Malbon, 1999). According to the typical, mammalian paradigm, in its inactive state, the G-protein exists as an associated heterotrimer. G-protein signaling begins with ligand binding that results in a conformational change in a G-protein-coupled receptor (GPCR). Once activated by the GPCR, the Gα separates from the associated Gβγ dimer and the freed Gα and Gβγ proteins can then interact with downstream effector molecules, alone or in combination, to transduce the signal. Subsequent to signal propagation, Gα re-associates with the Gβγ dimer to reform the G-protein complex.
There are several classical routes for signal propagation through heterotrimeric G-proteins that have been categorized in mammalian systems (Marrari et al, 2007; Dupre et al, 2009). One route, which we designate classical I, requires the presence of both subunits, and can invoke one of two distinct mechanisms. In one mechanism, on GPCR activation, freed Gα and Gβγ each interact with downstream effectors to elicit the downstream response. In a related mechanism, Gα but not Gβγ interacts with downstream effectors, but the Gβγ dimer is nevertheless required to facilitate coupling of Gα with the relevant GPCR (Marrari et al, 2007). In a second route, which we designate classical II, it is solely the Gβγ dimer that interacts with downstream effectors; in this case, sequestration of Gβγ within the heterotrimer prevents signal propagation. In addition, a few non-classical G-protein regulatory modes have also been implicated in some systems, for example signaling by the intact heterotrimer in yeast (Klein et al, 2000; Frank et al, 2005). Observations such as these lead to a fundamental question, namely, which of all the theoretical regulatory modes of G-protein signaling are realized biologically. Our study answers this question in the context of the model plant Arabidopsis thaliana, and in addition analyzes the manner in which G-protein signaling couples with signaling by the plant hormone abscisic acid. The Arabidopsis genome encodes only one canonical Gα subunit, GPA1, and one canonical Gβ subunit, AGB1, and knockout mutants are available for each of these, allowing clear dissection of Gα- and Gβ-related phenotypes.
Abscisic acid (ABA) is a major plant hormone, which inhibits growth and promotes tolerance of abiotic stresses such as drought, salinity, and cold. ABA signaling is known to interact with heterotrimeric G-protein signaling in both developmental and stress responses in a complex manner, causing, for example, ABA hyposensitivity of guard cell stomatal opening in gpa1 and agb1 single mutants as well as agb1 gpa1 double mutants (Fan et al, 2008), but ABA hypersensitivity of the inhibition of seed germination and post-germination seedling development in the same mutants (Pandey et al, 2006). These experimental observations implicate G-proteins as one of the components of ABA signaling, but to date no systematic study has been conducted in either plant or metazoan systems to define the co-regulatory modes of a G-protein and a hormone.
In this study, we conduct genome-wide gene expression profiling in G-protein subunit mutants of A. thaliana guard cells and leaves, with or without treatment with ABA. By introducing one or more mediators acting downstream of the G-protein and ABA to control transcript levels, we propose nine G-protein/ABA signaling pathways including ABA-independent G-protein signaling pathways, G-protein-independent ABA signaling pathways, and seven distinct ABA–G-protein-coupled signaling pathways (Figure 1). We develop a Boolean modeling framework to systematically enumerate 14 possible theoretical regulatory modes of the G-protein and 142 co-regulatory modes of the G-protein and ABA, and then use a pattern matching approach to associate target genes with theoretical regulatory modes.
Our analysis shows that the G-protein regulatory mode that requires the presence of both Gα and Gβγ subunits (consistent with classical I mechanisms), is well represented in both guard cells and leaves. The G-protein regulatory mode that requires a freed Gβγ subunit (classical II G-protein regulatory mechanism) is well supported in guard cells and somewhat less so in leaves. In addition, a G-protein regulatory mode representing a non-classical regulatory mechanism is prevalent in guard cells but less so in leaves (Figure 5). In this regulatory mode, signaling by Gβ(γ) occurs, and this signaling is not regulated in any way by Gα.
By relating the target genes with the nine proposed G-protein/ABA signaling pathways, we are able to gauge the plausibility of regulatory modes of the G-protein and ABA at the pathway level. We find that G-protein-independent ABA signaling pathways are prevalent in both guard cells and leaves. The existence of an ABA-independent regulatory activity of the G-protein is well supported in guard cells, but not supported in leaves. Additive regulation by G-protein signaling plus G-protein-independent ABA signaling is rare in both guard cells and leaves. In addition, combinatorial cross-talk between G-protein signaling and ABA signaling and additive cross-talk between ABA–G-protein signaling and G-protein-independent ABA signaling are observed in both guard cells and leaves. Our transcriptome analysis indicates that in some cases, ABA definitely does not influence G-protein signaling, though it may do so in some other cases.
To investigate whether previously observed hypersensitivity or hyposensitivity of developmental and dynamic transient responses to ABA in G-protein mutants is recapitulated at the level of transcriptional regulation, we compare gene regulation by ABA in guard cells and leaves of the G-protein mutants versus wild type. We find that in guard cells, equal ABA hyposensitivity of all mutants combined is significant, although hyposensitivity in individual mutants is not. There is also a separate group of genes in guard cells that show ABA hypersensitivity in the gpa1 mutant, suggesting complex interactions between ABA and G-protein signaling in gene regulation in this cell type. In leaves, ABA hyposensitivity of gene expression in the three individual mutants and equal hyposensitivity in all mutants are strongly supported. In addition, several of the functional categories identified by our analysis of G-protein regulatory modes have been implicated in previous physiological analyses of G-protein mutants, providing validation to the biological interpretation of our results.
In summary, by conducting a genome-wide gene expression profiling study in G-protein subunit mutants of A. thaliana guard cells and leaves and developing a Boolean modeling framework, we systematically evaluate the biological utilization of mechanisms of G-protein regulatory action and reveal novel regulatory modes of the G-protein. The results generate empirical evidence and insights regarding molecular events of G-protein signaling and response at the physiological level in both plants and mammals.
Heterotrimeric G-proteins mediate crucial and diverse signaling pathways in eukaryotes. Here, we generate and analyze microarray data from guard cells and leaves of G-protein subunit mutants of the model plant Arabidopsis thaliana, with or without treatment with the stress hormone, abscisic acid. Although G-protein control of the transcriptome has received little attention to date in any system, transcriptome analysis allows us to search for potentially uncommon yet significant signaling mechanisms. We describe the theoretical Boolean mechanisms of G-protein × hormone regulation, and then apply a pattern matching approach to associate gene expression profiles with Boolean models. We find that (1) classical mechanisms of G-protein signaling are well represented. Conversely, some theoretical regulatory modes of the G-protein are not supported; (2) a new mechanism of G-protein signaling is revealed, in which Gβ regulates gene expression identically in the presence or absence of Gα; (3) guard cells and leaves favor different G-protein modes in transcriptome regulation, supporting system specificity of G-protein signaling. Our method holds significant promise for analyzing analogous ‘switch-like' signal transduction events in any organism.
doi:10.1038/msb.2010.28
PMCID: PMC2913393  PMID: 20531402
abscisic acid; Arabidopsis thaliana; Boolean modeling; heterotrimeric G-protein; transcriptome
19.  Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster 
A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.
Author Summary
In contrast to the genomic sequences that encode proteins, little is known about the regulatory elements that instruct the cell as to when and where a given gene should be active. Regulatory elements are thought to consist of clusters of short DNA words (motifs), each of which acts as a binding site for sequence-specific DNA binding protein. Thus, building a comprehensive dictionary of such motifs is an important step towards a broader understanding of gene regulation. Using the recently published NestedMICA method for detecting overrepresented motifs in a set of sequences, we build a dictionary of 120 motifs from regulatory sequences in the fruitfly genome, 87 of which are novel. Analysis of positional biases, conservation across species, and association with specific patterns of gene expression in fruitfly embryos suggest that the great majority of these newly discovered motifs represent functional regulatory elements. In addition to providing an initial motif dictionary for one of the most intensively studied model organisms, this work provides an analytical framework for the comprehensive discovery of regulatory motifs in complex animal genomes.
doi:10.1371/journal.pcbi.0030007
PMCID: PMC1779301  PMID: 17238282
20.  CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L.) methylation filtered genomic genespace sequences 
BMC Bioinformatics  2007;8:129.
Background
Cowpea [Vigna unguiculata (L.) Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI), funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace) recovered using methylation filtration technology and providing annotation and analysis of the sequence data.
Description
CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS) isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs) knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource), and UniProtKB-TrEMBL). Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the potential domains on annotated GSS were analyzed using the HMMER package against the Pfam database. The annotated GSS were also assigned with Gene Ontology annotation terms and integrated with 228 curated plant metabolic pathways from the Arabidopsis Information Resource (TAIR) knowledge base. The UniProtKB-Swiss-Prot ENZYME database was used to assign putative enzymatic function to each GSS. Each GSS was also analyzed with the Tandem Repeat Finder (TRF) program in order to identify potential SSRs for molecular marker discovery. The raw sequence data, processed annotation, and SSR results were stored in relational tables designed in key-value pair fashion using a PostgreSQL relational database management system. The biological knowledge derived from the sequence data and processed results are represented as views or materialized views in the relational database management system. All materialized views are indexed for quick data access and retrieval. Data processing and analysis pipelines were implemented using the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The CPU intensive data processing and analysis pipelines were run on a computer cluster of more than 30 dual-processor Apple XServes. A job management system called Vela was created as a robust way to submit large numbers of jobs to the Portable Batch System (PBS).
Conclusion
CGKB is an integrated and annotated resource for cowpea GSS with features of homology-based and HMM-based annotations, enzyme and pathway annotations, GO term annotation, toolkits, and a large number of other facilities to perform complex queries. The cowpea GSS, chloroplast sequences, mitochondrial sequences, retroelements, and SSR sequences are available as FASTA formatted files and downloadable at CGKB. This database and web interface are publicly accessible at .
doi:10.1186/1471-2105-8-129
PMCID: PMC1868039  PMID: 17445272
21.  MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs 
BMC Bioinformatics  2007;8:100.
Background
A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs.
Results
We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method) for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search – i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis.
Conclusion
MotifCombinator helps users to systematically search for motif combinations that play an important role in gene expression as measured by microarrays.
doi:10.1186/1471-2105-8-100
PMCID: PMC1838919  PMID: 17378935
22.  Accurate timekeeping is controlled by a cycling activator in Arabidopsis 
eLife  2013;2:e00473.
Transcriptional feedback loops are key to circadian clock function in many organisms. Current models of the Arabidopsis circadian network consist of several coupled feedback loops composed almost exclusively of transcriptional repressors. Indeed, a central regulatory mechanism is the repression of evening-phased clock genes via the binding of morning-phased Myb-like repressors to evening element (EE) promoter motifs. We now demonstrate that a related Myb-like protein, REVEILLE8 (RVE8), is a direct transcriptional activator of EE-containing clock and output genes. Loss of RVE8 and its close homologs causes a delay and reduction in levels of evening-phased clock gene transcripts and significant lengthening of clock pace. Our data suggest a substantially revised model of the circadian oscillator, with a clock-regulated activator essential both for clock progression and control of clock outputs. Further, our work suggests that the plant clock consists of a highly interconnected, complex regulatory network rather than of coupled morning and evening feedback loops.
DOI: http://dx.doi.org/10.7554/eLife.00473.001
eLife digest
We live in a world with a 24-hr cycle in which day follows night follows day with complete predictability. Life on earth has evolved to take advantage of this predictability by using circadian clocks to prepare for the coming of night (or day), and plants are no exception. Even in constant darkness, characteristics such as leaf movements show a constant cycle of around 24 hr.
Most circadian clocks rely on negative feedback loops involving various genes and proteins to keep track of time. In one of these feedback loops, certain genes—called morning-phased genes—are expressed as proteins during the day, and these proteins prevent other genes—called evening-phased genes—from producing proteins. As night approaches, however, a second feedback loop acts to stop the morning-phased genes being expressed, thus allowing the evening-phased genes to produce proteins. And as day approaches, expression of these genes is stopped and the whole cycle starts again.
Many of the genes and proteins involved in the circadian system of Arabidopsis thaliana, a small flowering plant that is widely used as a model organism, have been identified, and its circadian clock was thought to rely almost entirely on proteins called repressors that block the transcription of genes. Now, Hsu et al. have shown that the Arabidopsis clock also involves proteins that increase the expression of certain genes at specific times of the day.
Hsu et al. focused on the promoter regions of evening-phased genes: these regions are stretches of DNA that proteins called transcription factors bind to and either encourage the expression of a gene (if the protein is a transcriptional activator) or block its expression (as a transcriptional repressor). In particular, they focused on a protein called RVE8 that is most strongly expressed in the afternoon and, based on previous research, is thought to activate the transcription of genes. Using genetically modified plants in which the gene for RVE8 can be turned on and off, they found that this protein led to increases in the expression of some genes, and reductions in the expression of others.
Further analysis showed that RVE8 was able to activate the expression of evening-phased genes directly, without requiring that new proteins be made first. By contrast, morning-expressed genes were likely to be suppressed by RVE8 via an indirect mechanism that involved other proteins that had previously been activated by RVE8. The expression of RVE8 itself is regulated by other clock genes and also by an undefined post-transcriptional process. Therefore rather than consisting of a morning feedback loop coupled to an evening feedback loop, with both loops being based on repressors, the plant clock is instead better viewed as a highly connected network of activators and repressors. Further research is clearly necessary to understand this unexpected complexity in the circadian clock of Arabidopsis.
DOI: http://dx.doi.org/10.7554/eLife.00473.002
doi:10.7554/eLife.00473
PMCID: PMC3639509  PMID: 23638299
circadian rhythm; transcription factors; evening element; phase; Arabidopsis
23.  miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations 
BMC Plant Biology  2012;12:68.
Background
Plant microRNAs (miRNAs) have been revealed to play important roles in developmental control, hormone secretion, cell differentiation and proliferation, and response to environmental stresses. However, our knowledge about the regulatory mechanisms and functions of miRNAs remains very limited. The main difficulties lie in two aspects. On one hand, the number of experimentally validated miRNA targets is very limited and the predicted targets often include many false positives, which constrains us to reveal the functions of miRNAs. On the other hand, the regulation of miRNAs is known to be spatio-temporally specific, which increases the difficulty for us to understand the regulatory mechanisms of miRNAs.
Description
In this paper we present miRFANs, an online database for Arabidopsis thalianamiRNA function annotations. We integrated various type of datasets, including miRNA-target interactions, transcription factor (TF) and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools, together with a user-friendly web interface. For each miRNA target predicted by psRNATarget, TargetAlign and UEA target-finder, or recorded in TarBase and miRTarBase, the effect of its up-regulated or down-regulated miRNA on the expression level of the target gene is evaluated by carrying out differential expression analysis of both miRNA and targets expression profiles acquired under the same (or similar) experimental condition and in the same tissue. Moreover, each miRNA target is associated with gene ontology and pathway terms, together with the target site information and regulating miRNAs predicted by different computational methods. These associated terms may provide valuable insight for the functions of each miRNA.
Conclusion
First, a comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of plant miRNAs. Second, a highly informative miRNA-mediated genetic regulatory network is extracted from our integrative database. Third, a set of statistical and mining tools is equipped for analyzing and mining the database. And fourth, a user-friendly web interface is developed to facilitate the browsing and analysis of the collected data.
doi:10.1186/1471-2229-12-68
PMCID: PMC3489716  PMID: 22583976
24.  Genome-wide analysis of ABA-responsive elements ABRE and CE3 reveals divergent patterns in Arabidopsis and rice 
BMC Genomics  2007;8:260.
Background
In plants, complex regulatory mechanisms are at the core of physiological and developmental processes. The phytohormone abscisic acid (ABA) is involved in the regulation of various such processes, including stomatal closure, seed and bud dormancy, and physiological responses to cold, drought and salinity stress. The underlying tissue or plant-wide control circuits often include combinatorial gene regulatory mechanisms and networks that we are only beginning to unravel with the help of new molecular tools. The increasing availability of genomic sequences and gene expression data enables us to dissect ABA regulatory mechanisms at the individual gene expression level. In this paper we used an in-silico-based approach directed towards genome-wide prediction and identification of specific features of ABA-responsive elements. In particular we analysed the genome-wide occurrence and positional arrangements of two well-described ABA-responsive cis-regulatory elements (CREs), ABRE and CE3, in thale cress (Arabidopsis thaliana) and rice (Oryza sativa).
Results
Our results show that Arabidopsis and rice use the ABA-responsive elements ABRE and CE3 distinctively. Earlier reports for various monocots have identified CE3 as a coupling element (CE) associated with ABRE. Surprisingly, we found that while ABRE is equally abundant in both species, CE3 is practically absent in Arabidopsis. ABRE-ABRE pairs are common in both genomes, suggesting that these can form functional ABA-responsive complexes (ABRCs) in Arabidopsis and rice. Furthermore, we detected distinct combinations, orientation patterns and DNA strand preferences of ABRE and CE3 motifs in rice gene promoters.
Conclusion
Our computational analyses revealed distinct recruitment patterns of ABA-responsive CREs in upstream sequences of Arabidopsis and rice. The apparent absence of CE3s in Arabidopsis suggests that another CE pairs with ABRE to establish a functional ABRC capable of interacting with transcription factors. Further studies will be needed to test whether the observed differences are extrapolatable to monocots and dicots in general, and to understand how they contribute to the fine-tuning of the hormonal response. The outcome of our investigation can now be used to direct future experimentation designed to further dissect the ABA-dependent regulatory networks.
doi:10.1186/1471-2164-8-260
PMCID: PMC2000901  PMID: 17672917
25.  CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks 
BMC Bioinformatics  2007;8:429.
Background
Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum.
Results
Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB.
Conclusion
The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at .
doi:10.1186/1471-2105-8-429
PMCID: PMC2194740  PMID: 17986320

Results 1-25 (1114242)