|Home | About | Journals | Submit | Contact Us | Français|
We present WebGeSTer DB, the largest database of intrinsic transcription terminators (http://pallab.serc.iisc.ernet.in/gester). The database comprises of a million terminators identified in 1060 bacterial genome sequences and 798 plasmids. Users can obtain both graphic and tabular results on putative terminators based on default or user-defined parameters. The results are arranged in different tiers to facilitate retrieval, as per the specific requirements. An interactive map has been incorporated to visualize the distribution of terminators across the whole genome. Analysis of the results, both at the whole-genome level and with respect to terminators downstream of specific genes, offers insight into the prevalence of canonical and non-canonical terminators across different phyla. The data in the database reinforce the paradigm that intrinsic termination is a conserved and efficient regulatory mechanism in bacteria. Our database is freely accessible.
Transcription termination is an important regulatory step of gene expression. All RNA polymerases that transcribe a DNA template must terminate, dissociate and release the product RNA at a defined position or region on the DNA. The RNA structure involved in this process is called a tanscrtiption terminator (1–3). In bacteria, wherein detailed studies have been carried out, termination is achieved by two mechanisms—intrinsic (factor independent) and factor dependent. The former process is primarily dependent on the secondary structure formed in the nascent RNA and can function in a minimal in vitro system in the absence of other proteins factors (4–6). In contrast, factor-dependent termination relies on proteins such as Rho and the Nus factors (7,8).
Once formed during transcription, the terminator interacts with RNA polymerase resulting in destabilization and dissociation of the ternary elongation complex (TEC) (3,9–11). Based on the studies in Escherichia coli, an intrinsic terminator is a RNA structure consisting of a guanidine-cytidine content (GC)-rich hairpin immediately followed by a stretch of 6–8U residues. Although such terminators were found in many genomes, their occurrence is rare in several other genomes when the stringent parameters were applied for the analysis. With the development of newer algorithms which could analyse genomes with different criteria, variant (non-canonical) terminators were detected and experimentally verified (12–20). Indeed, since intrinsic termination is an ancient and conserved mechanism, it is not surprising that all bacteria rely on this regulatory mechanism.
The exponential increase in available genomic data has now allowed us to analyse and catalogue the terminator content of nearly 2000 sequences (chromosomal and plasmid) of bacterial origin. Here, we present WebGeSTer DB (http://pallab.serc.iisc.ernet.in/gester), the largest collection of intrinsic terminators from all completely sequenced bacterial genomes and plasmids. The database has been compiled using WebGeSTer, an improved version of GeSTer (20). At present, WebGeSTer DB consists of all types of intrinsic terminators identified in 1060 bacterial chromosomes and 798 plasmids available at the NCBI database (Table 1). In all, information about 977 173 terminators, both canonical and non-canonical, have been compiled in the database (Table 1). The terminator profile for whole genomes as well as for individual genes can be extracted from WebGeSTer DB. The occurrence of terminators with respect to specific genes can be visualized in a high-resolution map. Furthermore, the database has a user-friendly and interactive interface that allows investigators to obtain results in both graphic and tabular form. The parameters for terminator search can be user-defined and one can upload new genome sequences in FAST Alignment (FASTA) or GenBank format for analysis.
The terminator database has been compiled using WebGeSTer, developed from the parent program, GeSTer (20), incorporating several improvements. The details of WebGeSTer generation are available in the website. Briefly, WebGeSTer accepts sequences in both GenBank and FASTA format, extracts the regions of −20 to +270bp relative to the stop codons from genomic sequences, and searches them for potential palindromic sequences. For the region downstream of every stop codon, all possible hairpins are computed and the most stable structure (with the most negative ΔG value) is selected as the ‘Best’ terminator. A genomic ΔGcut-off selects the final set of identified terminators. For any terminator, the sequence, genomic coordinates, structural parameters such as length of stem, loop, sequence following the hairpin mismatches and gaps can be obtained from the output. WebGeSTer can identify both canonical and non-canonical terminators and group them. The different types of terminators (Figure 1) catalogued in the WebGeSTer DB are:(i) L-shaped (canonical terminators): where the hairpin is followed by a 10bp trail having >3 uridylates. The four types of non-canonical terminators are: (ii) I-shaped: where there are ≤3 uridines in the trail following the hairpin, (iii) U-shaped: when there is more than one hairpin structure in tandem with an interval of <50nt between them, (iv) X-shaped: convergent type structures that function as terminators for the convergently transcribed genes on two different strands and (v) V-shaped: two hairpins, with the second hairpin starting immediately at the end of first one. In case of the U, V and X terminators, the individual structures can be L- or I-shaped. The program is adaptable in which the user can change the parameters such as stem-length, loop size, maximum allowance for mismatch and gap and also search region. The core algorithm of WebGeSTer was written in PERL and produced ASCII text files. From these files, data were extracted to populate the MySQL tables. The database was built using MySQL version 5.0.84 and interfaced using PERL version 5.10.0 and PHP version 5.2.9. The figures were drawn using GD library, version 2.45.To evaluate the accuracy and sensitivity of the WebGeSTer DB, a sample of 100 experimentally known terminators was assessed (14). The algorithm identified 91 of these terminators (Supplementary Table S1) and hence false negatives make up <10% of all the predictions. The detection ability of WebGeSTer was tested by drawing an receiver operating characteristic (ROC) curve for each genome. The ROC curves plot the probability of detection against the probability of false alarm at various input thresholds to the algorithm (16). The results for individual genomes are obtainable from the webpage. Further the validation of the predictions comes from analysing experimentally characterized operons (e.g. rrn, trp, thr, his operons of E.coli K-12). For these operons, WebGeSTer correctly predicts the terminators present at 3′ end, but not any ‘false’ intra-operonic terminators.
WebGeSTer DB is the largest compilation of intrinsic terminators till date. To facilitate retrieval of data from the WebGeSTer DB, the information has been arranged in different tiers ranging from different phyla to genomes of individual strains (Figure 2 and Table 2). A search initiated at a phylum (e.g. Firmicutes, Proteobacteria) can be threaded to finally reach the details of a particular terminator downstream of a specific gene-of-interest (Figure 3). In the database, users can find the terminator profiles of either an individual genome or all the member species of a given phylum/class by a easy-to-use search module. Information on a given genome has been further subdivided into files, which provide details, for e.g. of ‘All’ candidate terminators, the ‘Best’ candidate terminators and different types of terminators (L, I, U, V, X). The database contains several computed features for every individual terminator. These include its sequence, stem length, loop size, distance from gene, ΔG, etc. (Table 3). All the information can be downloaded as zipped files from the website for further processing.
WebGeSTer DB also provides the user with whole-genome terminator maps. The genes and different types of terminators of any genomic region from all of the 1858 sequences (1060 chromosomes+798 plasmids) can be visualized by TERminator MAP (TER-MAP), an interactive map at single gene level resolution (Figure 4). Genes and terminators of both strands are arranged in linear array in TER-MAP. ‘All’ identified palindromic structures are indicated and amongst them, the ‘Best’ terminator candidates are highlighted. Furthermore, the user can click onto the terminator-of-interest and be guided to the data for that specific terminator. Information about the genes can be similarly obtained that leads to the NCBI file (http://www.ncbi.nlm.nih.gov/protein) about that specific gene and gene product.
WebGeSTer works using a default set of parameters aimed to provide maximum number of accurate and sensitive predictions. These are: stem length between 4 and 30bp, and loop size between 3 and 9nt and maximum mismatch of 3nt (12,14,15,19). However, experiments have suggested that an intrinsic terminator with a stem length of 8–9bp is sufficient to enforce termination (10,21,22). Most experimentally known terminators have hairpins in this range. Furthermore, in silico analyses have previously shown that most terminators across diverse species have stem length between 6 and 13bp (14,16–18). Keeping these results in consideration, two sets of data are present for each sequence in the database. One of them has been generated with the default settings of GeSTer (stem length between 4 and 30bp, and loop size between 3 and 9nt). For the second set, criteria for stem length was set at 4–12bp, while loop size was 3–8nt.
WebGeSTer DB is unique in housing information also on several types of non-canonical terminators. Experimentally, there is a substantial body of evidence for non-canonical terminators (I, U, X and V-shaped) in many species of mycobacteria, Streptomyces lividans and actinophages (12,13,19,20,23). Non-canonical terminators also occur at ends of several experimentally identified operons in diverse bacteria (17,18,24). Even the prototypical E. coli seems to have a large number of non-canonical terminators and experimentally a mutant E. coli RNA polymerase has been shown to terminate at such non-canonical terminators(25). Information about non-canonical terminators from GeSTer results has been applied to define operon boundaries in the S. coelicolor genome (26), a bacteria with few canonical terminators. Thus, by also compiling data about non-canonical terminators, WebGeSTer DB could be a starting point for further research into understanding the mechanism of termination and improving genome annotation. However, it is possible that a subset of hairpins identified is class I pause signals and not necessarily non-canonical terminators. No secondary structure prediction algorithm can distinguish between them. One would have to experimentally determine the 3′-end of the RNA to distinguish class I pause signals from the non-canonical terminators.
Earlier, GeSTer data has been used to find terminators in the archaea Thermococcus kodakarensis (27). WebGeSTer DB has now a collection of terminator profiles for 77 archaeal genomes and plasmids of archaeal origin. Archaea employ a different mechanism for transcription termination, which is dependent on presence of T-rich sequences downstream of the stop codon, that would get transcribed into a U-stretch in the transcript (27). Since all L-shaped terminators invariably consist of a U-trail, the program can also detect several such archaeal terminators.
Keeping in mind the new scenarios (e.g. meta genomes) where the WebGeSTer algorithm could be effective in detecting intrinsic terminators, we have upgraded WebGeSTer to accept FASTA sequences from external users (Supplementary Figure S1). This could also be particularly useful to researchers who need to analyse a sequence that has not yet been made available in the GenBank format. The database is freely accessible and will be updated on a regular basis.
WebGeSTer DB also houses a detailed analysis of the structural parameters of terminators, their prevalence and their divergence. The analysis was carried out using data from a large sample extracted from the database with representative species from 22 phyla. The salient findings are summarized below:
WebGeSTer DB is a catalogue and presentation of intrinsic terminators. The data sets from WebGeSTer DB show that intrinsic termination is a universally conserved mechanism present in all bacterial species sequenced till date. The representative data from WebGeSTer DB are in agreement with the experimental evidence of intrinsic termination, and hence serve as a validation of the database. The database provides insight into the evolved variations in intrinsic terminators, like other successful regulatory process. The compilation would be invaluable for further experimentation on the mechanism of termination and understanding of gene expression in different bacteria.
Supplementary Data are available at NAR Online.
The work is supported by the Center for Excellence in Bioinformatics, Department of Biotechnology, Government of India and Center of Excellence for mycobacteria research grant, Department of Biotechnology, Government of India. Funding for open access charge: Center of Excellence for mycobacteria research, Department of Biotechnology, Government of India.
Conflict of interest statement. None declared.
We thank Rupesh Kumar and Shyam Unniraman for the design of Figure 1 and discussions respectively. V.N. is a recipient of J.C.Bose fellowship of Department of Science and Technology, Government of India.