|Home | About | Journals | Submit | Contact Us | Français|
PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1–6bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Simple sequence repeats (SSRs), also known as microsatellites, are the repetitive nucleotide sequences ubiquitously present in all the known genomes (1–9). These sequences characteristically comprise of mono to hexa nucleotide repeats that are arranged in tandem. SSRs undergo high rates of insertion and deletion (INDEL) mutations of their repeat units as a consequence of slipped mispairing of the nascent and the template strands during replication and hence exhibit high polymorphism (10,11). The INDEL mutations of repeat units in SSRs occurs at high frequencies ranging from 10−6 to 10−2 per generation, which is much higher than base substitution rates (6,11–13). Mutations in SSRs have different effects depending on the location of SSRs relative to the organization of genes (6,14). SSRs that are located far from coding regions may evolve neutrally and have no effect on structure and function of genes. On the other hand mutations of SSRs either in the coding regions or near the regulatory regions of genes could produce considerable effects on translation or transcription of genes. Furthermore, the severity of the effect in the coding regions depends on the repeat type and the repeat location (11). Polymorphic SSRs of repeating motif length 3 or 6nt in the coding regions of genome bring out in-frame mutations which translate into insertion or deletion of amino acid residues whereas polymorphic SSRs of non-triplet repeats (mono-, di-, tetra- and penta-nucleotide) bring out frame-shift mutations.
When one looks into abundance and length distribution of SSRs in genomes it gives an impression that SSRs are suppressed in prokaryotic genomes as compared to eukaryotic genomes (9). Nonetheless, some SSRs do show polymorphism and such SSRs have been known to render beneficial effects to prokaryotes [reviewed in (6,8,14)]. The well-documented effects have been the SSR mediated phase variation and antigenic variation which have been well-exploited by many pathogens to evade challenges offered by host immune systems and these have been studied in some bacteria (15).
Our group has been analyzing polymorphic SSRs in known prokaryotic genomes and trying to understand evolution of pathogens mediated by SSRs. During the course of our studies, we identified and extracted SSRs which show length variation among different strains and isolates available for 85 different prokaryotic species. All the data pertaining to these polymorphic SSRs (PSSRs) have further been compiled in the form of a relational database called PSSRdb. The present communication gives the details of this database.
The complete genome sequences of various species with a minimum of two strains were downloaded from NCBI (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). Extraction of PSSRs was done by an in-house developed tool called PSSRFinder (Kumar, P. and Nagarajaram, H.A., unpublished data) whose workflow is shown in Figure 1. Essentially, PSSRFinder runs BLASTN (16) to identify equivalent SSRs (SSRs having very similar/identical flanking sequences of lengths of at least 50bp) among all the genomes available for a species.Some essential details of the method are given below:
PSSRdb has been developed using MySql (www.mysql.com). PSSRs found in coding and non-coding regions are separately stored in two different logically connected databases. Both the coding and non-coding databases contain 357 tables each of which contains useful information pertaining to PSSRs viz., motif types, repeat copy numbers of SSRs, genomic location of SSRs and information pertaining to the coding regions harboring or flanking the PSSRs. The details of the structure of the relational tables in the coding and non-coding PSSR databases are given in Tables 1 and and2,2, respectively.
The Database overview is shown in Figure 2. The main page of the database contains a pull down menu containing the names of all the 85 species. Once a selection is made for a species the page is updated with the list of all the available strains belonging to the selected species. One can select two or more of the enlisted strains to query for PSSRs found in those selected set of strains. A separate option is provided to query for PSSRs found in the coding regions and the non-coding regions. A query leads to a page which gives the number of PSSRs found in the selected species. The numbers are clickable links and when clicked display pages containing the detailed information pertaining to the corresponding PSSRs. The displayed information includes the sequence of the repeat motif, its genomic location and the details of the regions harboring that repeat motif. In this page, hyperlinks are also provided to each of the listed PSSRs to design primers using PRIMER3 (14). The coding regions harboring or flanking the PSSRs are also hyperlinked to their respective annotations available at NCBI site (http://www.ncbi.nlm.nih.gov/).
As mentioned earlier, PSSRs stored in PSSRdb have been identified species-wise and these correspond to those SSRs which show length variation among different strains and isolates available for each of the 85 species. In this respect, we would like to sound a word of caution. Although all the prokaryotic genomes have >10× coverage, some sequencing or assembly mistakes cannot be completely ruled out. Some of SSRs may get qualified as PSSRs as a consequence of sequencing errors or due to mistakes committed during assembly of genome sequences. It is very difficult to identify such artifacts. Nonetheless, we believe the data represented in PSSRdb makes a good starting point for further exploratory investigations on SSR polymorphism in prokaryotes.
The identification of PSSRs in a species has a very good advantage. Depending upon the region of occurrence it could have different potential application. The strain specific PSSR (SSR length varies only in one strain) could be used for the identification of that strain and is of importance in making diagnostic kits. The genes harboring PSSRs form good candidates to study the functional role of genes in pathogenesis and virulence.
A hyper link will be provided to query for the multiple sequence alignment of the PSSRs along with their flanking regions.So that user can select the number of base pairs from upstream and downstream sequence and will do the multiple sequence alignment on fly. The database will be regularly updated as and when whole genome sequences of new prokaryotes become available.
The work as well as the publication costs were supported by the Core fund of Centre for DNA Fingerprinting and Diagnostics (CDFD).
Conflict of interest statement. None declared.
P.K. acknowledges Senior Research Fellowship (SRF) from Council of Scientific and Industrial Research (CSIR), India.