|Home | About | Journals | Submit | Contact Us | Français|
RegulonDB is the primary database of the major international maintained curation of original literature with experimental knowledge about the elements and interactions of the network of transcriptional regulation in Escherichia coli K-12. This includes mechanistic information about operon organization and their decomposition into transcription units (TUs), promoters and their σ type, binding sites of specific transcriptional regulators (TRs), their organization into ‘regulatory phrases’, active and inactive conformations of TRs, as well as terminators and ribosome binding sites. The database is complemented with clearly marked computational predictions of TUs, promoters and binding sites of TRs. The current version has been expanded to include information beyond specific mechanisms aimed at gathering different growth conditions and the associated induced and/or repressed genes. RegulonDB is now linked with Swiss-Prot, with microarray databases, and with a suite of programs to analyze and visualize microarray experiments. We provide a summary of the biological knowledge contained in RegulonDB and describe the major changes in the design of the database. RegulonDB can be accessed on the web at the URL: http://www.cifn.unam.mx/Computational_Biology/regulondb/.
Escherichia coli has been a model organism since the beginning of molecular biology. Current post-genomic research in bioinformatics, network analyses and modeling, and system biology, can strongly benefit from studies in E.coli, given the large amount of accumulated knowledge of the molecular biology of this cell. It may be that this is the cell for which we know more about the function of its genes, its metabolism and transcriptional regulation. This knowledge is the foundation for the proposal within the International E.coli Alliance, to achieve in E.coli, as a long-term goal, the first whole-cell model (1). We contribute to this international effort with RegulonDB, the primary database of the major international maintained curation of original literature with experimental knowledge about the elements and interactions of the network of transcriptional regulation in E.coli K-12. It is a relational database containing mechanistic information about operon organization and their decomposition in transcription units (TUs), promoters and their σ type, binding sites of specific transcriptional regulators (TRs), their organization into ‘regulatory phrases’, active and inactive conformations of TRs, as well as terminators and ribosome binding sites. All this information is mapped onto the E.coli K12 chromosome. The database is updated constantly by searching in original publications, and is complemented by computational predictions. Every object has experimental evidence, and a direct link to the original publication via MedLine. Previous publications explain the initial relational design and subsequent modifications (2–5).
We estimate that we have ~20–25% of all predicted interactions of the network (see the summary of the increasing content of RegulonDB by year shown in Table Table1).1). RegulonDB has been used in different types of analyses by the scientific community, such as predictions of regulatory sites (6) and operons (7–10); complementation of other databases, specifically, the mechanistic information gathered from the literature is included in EcoCyc (11); reconstruction of metabolic pathways with regulatory information (12); analyses of the connectivity and over-represented motifs in the regulatory network of E.coli (13–14); studies identifying objective criteria that characterize and define global regulators in E.coli (15); studies on the evolution of regulatory mechanisms (16–17), as well as analyses of microarray experiments (18).
The motivation to incorporate additional information comes from the fact that experimental research in E.coli, as in any other organism, goes well beyond knowledge about the molecular biology involved in regulation and transcription. Physiological and genetic studies add a rich layer of knowledge about the internal structure of the cell. There is, for instance, a large number of publications describing the effect in the expression of specific genes when changing the growth conditions of the cell, specifically experiments studying the effects of deletions of regulatory genes. This genetic and physiological information provides knowledge without necessarily specifying the corresponding molecular mechanisms.
Having this information expands the utility of RegulonDB. For instance, it can be used to compare and validate microarray experiments (18). Computational genomics has grown in methods and goals, moving from a sequence-centered approach to one where regulatory networks and interactions have become the main focus. Understanding the regulatory network will be crucial in the future goal of modeling, in silico, the behavior of E.coli as an entire cell (1).
In the following, we describe how growth conditions are modeled in the databases and then summarize the computational changes and additions to the database.
Free-living bacteria have to maintain a constant monitoring of extracellular physicochemical conditions in order to respond and modify their gene expression patterns accordingly. A series of genes whose products are involved in sensing and incorporating the different nutritional elements, as well as products sensing the concentration of toxic elements, are present in E.coli. These sensing systems are connected through metabolic intermediates to the transcriptional machinery, which in turn modifies the expression of genes whose products are involved in the response and adaptation to the corresponding changes in the environment. For the past 2 years, we have been collecting and organizing, from the original literature, information about different growth conditions and the associated observed effects in the transcription of E.coli genes. Since the first published version of RegulonDB (2), we described in the relational design the modeling of physiological conditions and their connection to the transcriptional machinery. However, as mentioned in that paper, we were not then involved in gathering such types of information.
After an analysis of several different conditions and systems involved, we decided to implement a model where the following properties and descriptions are considered essential: (i) a general or global condition; (ii) the control condition; (iii) the specific experimental condition; (iv) the growth media used; (v) the genes affected; and (vi) the effect of the experimental condition in the expression of the affected genes (induced, repressed or no effect). Since every added object in RegulonDB is supported by associated evidence and literature citation, we had to implement a set of criteria to classify the evidence concerning different levels of expression of genes.
To quantify gene expression, by far the most frequent methodology is that of transcriptional fusion. These studies provide quantitative information easy to classify. We incorporate as affected genes those with an expression change of least a 2-fold increase or decrease. Otherwise, genes are added to the database considered cautiously as genes with ‘no effect’ or no change in expression under the specified condition. In a small fraction of cases there is no quantitative information on the level of expression of the affected gene and, therefore, its classification is not straightforward. In those cases the curator’s criterion is essential. The classification of the level of expression depends on the authors’ statements, the visual inspection of the spots in the figures in the publication, as well as, ideally, additional evidence in other publications.
Whenever available, additional information is incorporated, i.e. mechanistic properties that are already part of RegulonDB: (i) the transcription unit to which the gene belongs, the associated promoter and terminator; (ii) the regulatory protein that is involved; (iii) the set of sites in the DNA involved in regulation of transcription; (iv) the allosteric conformations and associated effectors involved; as well as (v) the intermediate metabolites or proteins that participate in the regulatory sensing mechanism. The design and discussion of the potential applications of this corpus of knowledge is presented in more detail in a separate paper (19).
Table Table22 summarizes the information that we have gathered up to September 2, 2003 concerning physiological conditions and their effect on the transcription of genes. The numbers in this table account for unique cases, thus 327 genes have information about their expression in 83 different conditions. Since there is information for genes affected in different conditions, these genes are described a total of 679 times with their associated specific conditions.
We have changed the web interface so that the main menu remains fixed throughout navigation. For instance, the ZoomTool that displays the whole genome is now shown without invoking an additional external window. We have added a new selection by functional class within the graphic display. A very useful navigation feature in the analyses of transcriptome data is the new capability of taking a file with a list of genes and getting their display in the circular genome. GETools, a suite of programs linked to the database, was specifically designed to analyze, generate graphic displays and extract information from RegulonDB, from an input based on microarray files (20).
Alignments and matrices for each transcriptional regulator have been updated, and their automatic update as new sites from the literature accumulation has been implemented. The process begins by getting all the regulatory binding sites with experimental evidence, then, the program CONSENSUS 5c (21) is applied to generate the corresponding weight matrix. We get the first matrix of the second cycle, where all the sequences are included. This matrix and the program PATSER 3b (22) are used to score these same known sites. From this scoring, we define alternative thresholds available for the user, to search for similar sites in other DNA sequences. RegulonDB users can obtain these data by querying for ‘Transcriptional Regulator’.
There are two ways the user can access the information on growth conditions that affect specific genes, either through a list of conditions available in the main page, or by searching for individual genes. Furthermore, we have added links to the OU microarray database (http://www.ou.edu/microarray/Macroarray/). RegulonDB is also now linked to Swiss-Prot and Swiss-Prot has links to RegulonDB.
The information on the effect of growth conditions on gene expression will be of great value in defining and modeling functional modules in cellular physiology. Metabolic intermediates and environmental signals, functioning as allosteric effectors of transcriptional factors, are additionally available in RegulonDB. Together, this information will enable a more complete description of sets, or modules, of genes as they are expressed in E.coli in response to different environmental conditions.
An example of the use of the knowledge gathered in the database is the comparison of what RegulonDB would predict in terms of expression profiles, and what is observed in microarray experiments (19).
We have also made a proposal of diagnostic criteria to identify global regulators, where we have shown that global regulators are active in a larger number of different growth conditions than specific or dedicated regulators. This observation enriches the original requirement of global regulators to regulate genes that belong to different metabolic pathways (15).
The current expansion of data gathered and organized in RegulonDB will reinforce and contribute to the efforts of the international community in the long-term goal of modeling of the full E.coli cell (1).
We kindly ask users of RegulonDB to cite this article.
We acknowledge Rosa María Gutiérrez-Ríos and Mónica Peñaloza-Spínola their participation in discussions on growth conditions, and Víctor del Moral and Romualdo Zayas for their computer support. This work was supported by NIH grants GM62205-02 and 1-R01-RR07861.