The availability of fully sequenced genomes has grown exponentially over the past few years. There is a huge variety of environments for the prokaryote species, as well as different metabolic and genomic complexities. However, prokaryote genomes have common architectural principles [
1]. The prokaryote genomes contain protein-coding genes, structural RNAs and spacers between genes which are thought to typically contain regulatory signals [
2] and the origin of replication sequence [
3]. These spacers tend to be short because of the selective pressure to minimize the non-functional DNA in prokaryotes [
2,
4]. It is a consistent feature of these genomes that the genes often overlap their coding sequences [
5]. Under this scenario of genomic compactness due to their physically small environments, the overlapping genes follow the rules that impose the structure of the genetic code and the spacers between genes must adapt their lengths to the requirements of the regulatory signals [
2].
One of the regulatory signals that we can find between genes is the Shine-Dalgarno (SD) sequence [
6]. The SD sequence is a motif, 5'-GGAGG-3', located at the 5' of the initiation codons and is complementary to the sequence, 5'-CCUCC-3', located at the end of the 16S rRNAs [
6]. The ribosome does not need a perfect distance between the SD sequence and the start codon for the initiation of translation. However, it has been studied that when the SD resides within the 4 nucleotides from the initiation codon or when it is located as far as 13 nucleotides from the initiation codon, gene expression is decreased drastically [
7-
9]. The prokaryote species seem to have preferred distances between the SD and the start codon and these distances vary among the species [
10], although this sequence has been found mostly from the 7th to the 12th base upstream from the start codon [
10-
12]. The location of the SD can help to correct the gene annotations [
13] and could influence the spacing length and the stop codon usage [
14].
Among the prokaryote genomes there is a huge amount of examples of overlapping genes [
15-
19]. The overlapping lengths tend to be short because of the selective pressure against long overlaps, as the existence of long overlapping reading frames increases the risk of deleterious mutations. The co-directional overlaps are the most common overlaps, which reflect that this is the most common orientation for a gene pair due to the tendency to be grouped in operons in prokaryote genomes [
20-
22]. Among the co-directional overlaps the 4 bps overlap is extremely common [
5,
15,
23,
24], which permits the upstream stop codon and the downstream start codon overlap and the gene pair is thought to be translationally coupled [
25]. The co-directional and divergent overlapping genes can arise by 5'-end elongations when the downstream gene adopts a new start codon within the upstream coding sequence [
23], while the co-directional and the convergent overlapping genes can arise by 3'-end extensions after a loss codon event [
16]. Overlaps in prokaryotes have been hypothesized to be involved in reducing the genome size in order to increase the density of genetic information [
17,
24,
26-
28], and in regulating gene expression through translational coupling of functionally related polypeptides [
5,
24,
26,
29,
30]. In addition, other authors have used the overlapping pairs as genetic markers for phylogenetic inferences due to its high conservation [
31,
32]. Overlapping genes are better conserved across the species than non-overlapping genes [
19]. The extent of conservation of the overlapping pairs correlates with the evolutionary distances between the pairs of species [
15].
The overlapping genes, as a common structure of the prokaryote genomes, and the spacers between genes are structural features worth studying in prokaryotes. However, the analysis of both the overlapping genes and the spacers between genes is often affected by genome annotation errors [
33-
35]. An accurate annotation would facilitate the experiments as well as the bioinformatic analysis of gene regulation and gene structure [
36]. In this interactive database is stored all the overlapping genes and the spacers of 678 fully sequenced prokaryote genomes. The aim of this database is to provide the users with useful information about the overlapping genes and the spacing lengths between adjacent genes. The conservation of the overlaps across the species and the SD presence and location within the intergenic regions or the overlapping sequences can be analysed. Obviously, the quality of the information given depends on the quality of the genome annotations. In fact, this database can be used to analyse suspicious cases of genome annotation errors such as wrong initiation sites or false gene predictions.