Salmonella enterica serovar Typhi is the causative agent of typhoid fever.
S. Typhi does not have an animal reservoir and can be transmitted from a typhoid carrier only through contaminated water or food (
11). It was estimated that the global incidence of typhoid is 16,000,000 cases, with 500,000 deaths per year (
9). In this study, we isolated and sequenced the
S. Typhi strain of a chronic carrier from a region in India where the disease is highly endemic.
Whole-genome sequencing was performed with both Roche 454 and Illumina paired-end sequencing technologies. A 4-kb genomic library was constructed and 177,021 paired-end and 65,478 single-end reads were generated using the GS FLX Titanium system, giving ~18-fold coverage of the genome. A total of 97.09% of the reads were assembled into 11 scaffolds using Newbler (Roche). A total of ~500 Mbp of 3-kb mate pair (MP) sequencing data (100-fold coverage) were generated with an Illumina Solexa GA IIx. These sequences were mapped to the scaffolds by using the Burrows-Wheeler Alignment (BWA) tool (
7). Gaps were closed by sequencing PCR products. Coding sequences were predicted using the ISGA (Integrative Services for Genomic Analysis) pipeline (
5) and DIYA (Do-It-Yourself Annotator) pipeline (
12), which comprises Glimmer (
3), tRNAscan-SE (
8), RNAmmer (
6), BLAST (
1), and Asgard (
2). Annotation results were improved and checked using CLC Genomics Workbench.
The complete genome of S. Typhi P-stx-12 consists of a single circular chromosome of 4,768,352 bp with a GC content of 52.1% and a 181,431-bp plasmid with a GC content of 46.4%. The chromosome consists of 4,691 predicted coding sequences (CDS), 22 rRNA genes, and 76 tRNA genes, while the plasmid consists of 234 protein-coding genes. Over 75% of the genes were assigned to specific clusters of orthologous groups (COG), and approximately 25% were assigned an enzyme classification number and were involved in 268 predicted metabolic pathways. A clustered regularly interspaced short palindromic repeat (CRISPR) element was detected in the chromosome at position 2900675 to 2901069.
The genome of
S. Typhi P-stx-12 was significantly different from the other two published genomes of
S. Typhi strains, CT18, which was isolated in Vietnam (GenBank accession number
AL513382), and Ty2, which was isolated in Russia (GenBank accession number
AE014613). Comparison between these three genomes revealed that the coding genes of
S. Typhi P-stx-12 were 84% similar to those of CT18 (
10) and Ty2 (
4). The 17 pathogenic islands which were found in the previous two genomes were also identified in
S. Typhi P-stx-12. This strain has one plasmid which shares 169 orthologous CDS with pHCM1, the plasmid belonging to CT18 (GenBank accession number
AL513383). It is worth noting that the plasmid of P-stx-12 carries genes encoding the tetracycline resistance protein and tetracycline repressor protein TetR, possibly conferring drug resistance to this strain. Interestingly, this genome has fewer pseudogenes than CT18 and Ty2 but a higher number of hypothetical proteins.
Nucleotide sequence accession numbers. The genome sequences of S. Typhi P-stx-12 have been deposited in GenBank under accession numbers CP003278 (chromosome) and CP003279 (plasmid).