are intracellular pathogens in cold-blooded as well as warm-blooded animals and important zoonotic agents. The genus Salmonella
is currently divided into two species: Salmonella enterica
and Salmonella bongori
. S. enterica
is further divided into six subspecies: S. enterica
, S. enterica
, S. enterica
, S. enterica
, S. enterica
, and S. enterica
. To date, more than 2,500 different serovars have been characterized, with most (1,531) classified as part of the Salmonella
], which is the cause of more than 99% of the diseases in humans [1
]. The characterization is based on their surface antigens, where the O (somatic) antigens are part of the variable long chain lipopolysaccharide located on the outer membrane and the two H (flagellar) antigens are presented, when the two flagellar structures are expressed [1
serovar Typhimurium and serovar Enteritidis are amongst the most common generalist pathovars, causing disease in a variety of animals [4
]. A smaller proportion of the serovars is host-specific and cause severe diseases. S.
Typhi and Salmonella
Paratyphi are human-restricted, causing typhoid and paratyphoid fever respectively [6
]. The bovine-adapted Salmonella
Dublin and the porcine-adapted Salmonella
Choleraesuis are occasionally seen in humans, causing severe disease [7
]. Traditionally, animal models have successfully been employed to elucidate the pathogenicity of intestinal Salmonella
], but these methods have inherent limits. Many disease mechanisms in Salmonella
are host-specific, most famously the enteroinvasive behavior of S.
Typhi in human infections [12
], or more recently the human-adapted behavior of strain Salmonella
Typhimurium D23580 [13
]. In these cases, comparative genomics represent an alternative approach [14
is closely related to Escherichia coli
, but have an additional large number of virulence genes [15
]. Some of these virulence genes are located in genomic islands (GIs), which are large segments of DNA acquired by horizontal gene transfer. These GIs often display a different AT content than from the rest of the genome of S. enterica
(which is ~48% AT) [15
]. These are usually located near tRNA genes, which are believed to facilitate the integration of the GIs into the chromosome due to their high degree of conservation. Many Salmonella
-specific GIs, Salmonella
pathogenicity islands (SPIs) play a role in virulence and have been linked to influencing host specificity as well as the degree of invasiveness of the bacteria [17
Much research has been invested in order to identify Salmonella
-specific genes and to determine genes specific to the different serovars. The S.
Typhi and S.
Paratyphi A serovars are both adapted to the same host and cause enteric fever in humans. This study shows that they are highly homologous at the protein level. A comparison of their evolutionary relatedness has suggested that they have evolved the ability to cause human-specific systemic disease by different paths. S.
Paratyphi A is less diverse in terms of the proteins encoded in the genome, and contains fewer pseudogenes, which indicates that it has evolved more recently than S.
]. When the complete genome sequence of S.
Typhi CT18 was published, 204 pseudogenes were annotated, out of a genome of 4,599 genes [19
]. This total was increased later, when the second Paratyphi A genome (strain AKU_12601) was sequenced and through comparative genomics revealed several additional pseudogenes in S.
Typhi. Further, the two strains shared 66 pseudogenes, revealing that many of these have appeared from adaption to the same niche [6
]. Some of these genes have been shown to relate to virulence and gastroenteritis, leading to the hypothesis that the original function of many of these pseudogenes was to cause gastroenteritis or infection in other hosts [18
This work represents a data-driven approach towards elucidating the differences as well as similarities between fully sequenced Salmonella
genomes. As the number of fully sequenced genomes available for analysis increases, so will the possibility to differentiate at greater detail between phenotypic characteristic such as host-specificity and the degree of invasiveness. At the time of writing (late 2010) we found 45 fully sequenced Salmonella
genomes publicly available covering 21 serotypes within Salmonella
and representing, to our knowledge, the total sum of public genomes. Of these, 22 were complete, and 23 were draft sequences (consisting of many pieces or “contigs” and often with incomplete gene annotation). This study compares the sequences having the highest quality, which corresponds to 35 Salmonella
genomes. We estimated both the sizes of the pan- and core genomes, as well as illustrated the spatial distribution of core and non-core genes across the chromosome. From these data, we describe several variable gene islands in specific locations on the chromosome including, but not limited to, the SPIs [20
]. It follows that some of these unnamed gene islands are likely to play a role in Salmonella
virulence and/or host specificity, even if others may be a little more than inactive remnants of phage inserts.