|Home | About | Journals | Submit | Contact Us | Français|
The Arabidopsis Gene Regulatory Information Server (AGRIS; http://arabidopsis.med.ohio-state.edu/) provides a comprehensive resource for gene regulatory studies in the model plant Arabidopsis thaliana. Three interlinked databases, AtTFDB, AtcisDB and AtRegNet, furnish comprehensive and updated information on transcription factors (TFs), predicted and experimentally verified cis-regulatory elements (CREs) and their interactions, respectively. In addition to significant contributions in the identification of the entire set of TF–DNA interactions, which are the key to understand the gene regulatory networks that govern Arabidopsis gene expression, tools recently incorporated into AGRIS include the complete set of words length 5–15 present in the Arabidopsis genome and the integration of AtRegNet with visualization tools, such as the recently developed ReIN application. All the information in AGRIS is publicly available and downloadable upon registration.
A first step toward elucidating the dynamic behavior of gene regulatory networks involves compiling the parts lists, formed by the transcription factors (TFs), all the gene regulatory regions and the corresponding interactions (1). Over the past few years, Arabidopsis has emerged as a model plant (2), and system approaches that rely on the easy access to genomic and gene regulation information have begun to identify the architecture of the underlying networks for this organism (3). AGRIS is dedicated to storing information on TFs and cis-regulatory elements (CREs), and helping reveal gene regulatory networks in Arabidopsis. Different from other community resources such as ATTED-II, which focuses primarily on co-expression analysis (4), Athena that focuses on promoter analysis (5) or PlnTFDB (6), which harbors computationally curated information on TF from many different plant species, AGRIS is Arabidopsis–centered, primarily hand-curated, integrates TF and promoter information primarily from experimental sources and combines them into gene regulatory networks. AGRIS is currently composed of databases of TFs (AtTFDB), predicted and experimentally demonstrated CREs (AtcisDB), and the experimentally verified physical interactions between TFs and gene regulatory regions, represented in AtRegNet and visualized by the ReIN tool.
AtTFDB contains information on regulatory proteins, where TFs are grouped into families based on the presence of conserved domains and following prior classification criteria (7). The current version of AtTFDB contains 50 families and 1773 TFs. The identification of TFs involved a combination of BLAST and motif searches based on the literature available on known TFs, and on continuous manual literature curation.
AtcisDB holds information on sequences for the upstream regions of genes with CRE annotations. The current version of AGRIS (August 2010) contains 33239 nuclear promoter sequences (not restricted to RNA polymerase II genes) with descriptions of putative and experimentally confirmed CREs. AtcisDB maps CREs to their respective locations in gene promoters, and displays them in a graphical form, clearly distinguishing predicted CREs from their experimentally validated counterparts.
The Arabidopsis Transcription Factor Database (AtTFDB) contains information on 1773 TFs, grouped into 50 families, based on the presence of conserved domains, often involved in DNA-binding and/or dimerization. The current version of AGRIS contains an additional 83 TFs since the last major release (8) and these TFs were identified following the same criteria as previously described (7).
Users can access information on Individual TFs by browsing or searching with unique gene locus identifiers (AGI ID, Atxgxxxxx) or common gene names. The resulting table contains the AGI ID, gene synonyms, links to MIPS (9) (http://mips.helmholtz-muenchen.de/plant/athal/), SALK (10) (http://signal.salk.edu/cgi-bin/tdnaexpress) and TAIR (11) (http://arabidopsis.org/), nucleotide and protein sequences and information on the participation of the respective TF in a regulatory sub-network. The locus ID is linked to a summary page for each particular TF, which displays the most prominent information for it, including whether clones or other resources are available upon request from the Arabidopsis Biological Resource Center (ABRC, http://abrc.osu.edu/).
The updated version of AGRIS includes two additional columns (‘Direct Targets’ and ‘Total Direct Interactions’) in order to provide integrated regulatory information for each TF (Figure 1), which is extracted from the contents of AtRegNet (see below). The ‘Direct Targets’ column corresponds to the number of confirmed and unconfirmed targets; confirmed targets corresponding to those for which two or more experimental approaches confirmed them as directly regulated by a particular TF, and unconfirmed corresponding to those identified by just one experiment. The ‘Total Direct Interactions’ column corresponds to the total number of interactions in which that particular TF is involved, including the cases where a particular TF is targeted by other regulatory factors. In graph theory language, the ‘Total Direct Interactions’ number represents the degree or valency of that particular node.
The Arabidopsis cis-regulatory element Database (AtcisDB) consists of a searchable relational database, which includes multiple data types, including TF-binding sites and complete upstream promoter sequence information. In the updated version of AGRIS, the upstream 3000bp promoter sequence region of genes were obtained from TAIR9, resulting in a total of 33239 sequences. To increase the scalability and extensibility, the Genome Data Visualization Toolkit (GDVTK) (12) used to display gene regulatory information on genomic regions in previous releases of AGRIS, was replaced with the Generic Genome Browser (13) (http://gmod.org/wiki/GBrowse). The Generic Genome Browser framework allows the integration of AtcisDB with the visualization and integration of any genomic data, for instance word counts and CRE location information.
The latest AGRIS release makes genome-wide word counting an integral component of AtcisDB (14). Using the WordSeeker (http://word-seeker.org/) enumerative word discovery approach, putative CREs can be identified de novo (without prior knowledge) by investigating their over-representation and correlation with functional (e.g. gene expression) studies. Accessible through AtcisDB, the word landscape analysis of the Arabidopsis non-coding regions for word lengths 5–15 (http://arabidopsis.med.ohio-state.edu/words/) is available for complete download or for query with particular motifs (Figure 2), with over-representation scores calculated, as previously described (14).
In the context of WordSeeker approach, a ‘word’ is defined as a strings of letters [Adenine (A), Guanine (G), Cytosine (C) and Thymine (T)], each of which is found a specific number of times in a particular genome. Due to the integration of the WordSeeker tool, AGRIS is able to provide a complete word enumeration summary of user-queried words in non-coding domains of the Arabidopsis genome, including promoter regions, introns, and 3′ and 5′ untranslated regions (3′ UTR and 5′ UTR). Furthermore, the positional distribution of selected words in all genomic segments is visually displayed. This represents an important step toward fully cataloging the functional elements of the Arabidopsis genome.
The Arabidopsis thaliana regulatory network database (AtRegNet) documents and visualizes networks formed by TFs and their direct target genes only (8). Currently, AtRegNet contains information on physical direct regulatory interactions between 8070 target genes, 64 TFs and three TF complexes. A complex in AtRegNet is defined as more than one transcriptional regulator recruited simultaneously, and often in a synergistic fashion, to DNA. The total gene regulatory network contains 8100 nodes, comprising 814 TF and 7286 non-TF encoding genes, connected by 11123 edges. The information was parsed from 76 published studies, and derived from a combination of experimental approaches, including data generated from high-throughput in vivo DNA-binding techniques such as ChIP–Chip and ChIP-Seq.
Within these 11123 edges, a set of 650 interactions was classified as ‘confirmed’, because they fulfilled the following criteria: A TF was shown to bind to the regulatory region of the target gene, AND in vivo evidence of regulation of the gene by the TF was available OR a TF directly regulates the target gene, AND in vivo evidence of regulation of the gene by the TF was available. For example, consider the direct targets of the basic helix–loop–helix TF AtbHLH15 which includes 750 direct targets identified by ChIP–Chip (15). Out of these 750 putative targets, gene-expression microarray experiments showed 11 as regulated by AtbHLH15, and were demonstrated to be bound by AtbHLH15 in ChIP–PCR experiments. Thus, these 11 genes are considered as ‘confirmed’ AtbHLH15 direct targets and the rest were classified as ‘unconfirmed’, awaiting further studies.
As a custom tool to view TF networks, we developed the Regulatory Networks Interaction Module (ReIN, http://arabidopsis.med.ohio-state.edu/REIN), an interactive tool capable of integrating AtRegNet data with the AtcisDB and AtTFDB databases and expanding networks on user’s demand (Figure 3). ReIN complements the visualization of AtRegNet by more conventional network visualization applications, such as Cytoscape (16) (http://www.cytoscape.org).
In ReIN, nodes and edges are empowered with dynamic links that allow users to obtain additional information. For example, in the case of a node that corresponds to a TF target, the user can explore available information for this gene in AtcisDB, TAIR and AtTFDB, when appropriate. In addition, the user can expand the network by displaying, for example, targets of a target TF. Edges are linked to the abstract of the corresponding publication on PubMed that supports the relationship. ReIN also has the capability to add information provided by the user to any network displayed, and networks can be saved in a graphic format, or as TXT or XGMML files, the latter providing facile import into Cytoscape.
ReIN is accessible by clicking the ‘Regulatory Networks’ link in the title bar of all AGRIS pages, or through the ‘Total Direct Interactions’ column on the summary page obtained when displaying specific TFs or list of TF families from AtTFDB. This information is only shown for those TFs for which information is available in AtRegNet. To fully explore the capabilities of ReIN, a tutorial is available at http://arabidopsis.med.ohio-state.edu/REIN/ReINtutorial.html.
Contents of AGRIS are freely accessible online. After a free registration process, users can download the database contents as plain text. Registration and downloads are possible through http://arabidopsis.med.ohio-state.edu/downloads.html.
National Science Foundation (MCB-0418891 to E.G.); and National Institutes of Health (5 T32 CA106196-05 to A.Y.). Funding for open access charges: Ohio State University discretionary funds (to E.G.).
Conflict of interest statement. None declared.
We would like to thank to Arabidopsis community for their support and their help in the curation of the data available through AGRIS.