|Home | About | Journals | Submit | Contact Us | Français|
We present the 4.8-Mb complete genome sequence of Salmonella enterica serovar Typhi strain Ty2, a human-specific pathogen causing typhoid fever. A comparison with the genome sequence of recently isolated S. enterica serovar Typhi strain CT18 showed that 29 of the 4,646 predicted genes in Ty2 are unique to this strain, while 84 genes are unique to CT18. Both genomes contain more than 200 pseudogenes; 9 of these genes in CT18 are intact in Ty2, while 11 intact CT18 genes are pseudogenes in Ty2. A half-genome interreplichore inversion in Ty2 relative to CT18 was confirmed. The two strains exhibit differences in prophages, insertion sequences, and island structures. While CT18 carries two plasmids, one conferring multiple drug resistance, Ty2 has no plasmids and is sensitive to antibiotics.
Salmonella enterica serovar Typhi is a human-specific pathogen causing enteric typhoid fever, a severe infection of the reticuloendothelial system (8, 10, 14). Although difficult to estimate, it is thought that at least 16 million cases and 500,000 deaths occur each year around the world (8). The early administration of antibiotic treatment has proven to be highly effective in eliminating infections, but indiscriminate use of antibiotics has led to the emergence of multidrug-resistant strains of S. enterica serovar Typhi (13, 31). Since typhoid is becoming difficult to treat with conventional drugs, information about the whole genome sequence and genes of S. enterica serovar Typhi will help to reveal more specific targets for drugs aimed at disease treatment and vaccines for prevention. Although S. enterica serovar Typhi can grow in laboratory media and survive in other hosts, such as experimental mice, humans are the only known natural hosts. The mouse model is a poor representive of the human disease. Experimental infections of chimpanzees or human volunteers have been the only way to relate bacterial genetic characteristics with pathogenic effects. Consequently, what is known about S. enterica serovar Typhi pathogenicity has been largely extrapolated from studies of S. enterica serovar Typhimurium infections in mice.
In this work, we present the genome sequence of the well-studied pathogenic strain Ty2. This strain was the foundation for vaccine development and was the parent of mutant strains Ty21a and CVD908 and their derivatives, used in trials of live attenuated vaccines (16). Isolated before the emergence of drug resisitance in the 1970s, it contains no plasmids. In contrast, recently isolated S. enterica serovar Typhi strain CT18 (25) carries multiple drug resistance cassettes on a large plasmid and contains a second large plasmid closely related to pMT1 of Yersinia pestis. The genome sequence of CT18 was recently determined at the Sanger Centre, Hinxton, United Kingdom (GenBank accession number AL513382), and we have used this sequence to perform a detailed comparison of the two genomes.
S. enterica serovar Typhi strain Ty2 was obtained from K. Sanderson (Salmonella Genetic Stock Center isolate 2666) and was redeposited at the American Type Culture Collection under accession number 700931. The strain was cultured and DNA was kindly prepared by R. A. Welch, University of Wisconsin.
A whole-genome shotgun library was prepared using nebulization to produce random fragments by mechanically shearing genomic DNA (21). Fragments in the size range of 1 to 2.5 kb were prepared by agarose gel electrophoresis, ends were repaired, and the fragments were cloned into the M13Janus vector (4). Random phage clones were isolated, and purified DNAs were prepared as templates for sequencing reactions by using dye terminator chemistry. Some templates were prepared by PCR directly from M13Janus clones. Sequence data were collected with ABI377 and 3700 automated sequencers, and data were assembled by using Seqman II, Genome Edition (DNASTAR, Madison, Wis.). For the finishing steps, clonal and genomic PCR techniques provided templates for primer walking to link contigs, add complementary reads, and increase coverage. The final coverage was 9.8-fold. The contig order and assembly structure were independently confirmed by using a whole-genome optical map of recognition sites for restriction enzyme NheI (18).
When the genome of S. enterica serovar Typhi strain CT18 was published, our sequence was completely enclosed in a single contig, but many ambiguities remained and the sequence was not annotated. We made use of the CT18 sequence to accelerate our analysis. Open reading frames (ORFs) in Ty2 were defined by using GeneMark.hmm (1) and compared with CT18 ORFs by using BLAST. Most of the differences were identified from the ORF comparisons and then were checked in detail by using Lasergene DNA analysis tools (DNASTAR). We compared the Ty2 genome with the CT18 genome for all ambiguities and conflicts. When our primary trace data were adequate and consistent with the CT18 sequence at a confidence level of over 70%, we corrected the ambiguities and conflicts to match the CT18 consensus sequence. In total, 2,739 of 4,742 total ambiguities and conflicts were corrected by this process, leaving 2,003 in the Ty2 sequence due to insufficient evidence tosupport such changes or sound data confirming about 450 real differences between the two genomes. Every sequence difference was carefully inspected, and any ambiguity that could indicate a Ty2 feature different from CT18 was resolved. All Ty2 annotations that were different from CT18 were based on sound data. Ty2 ORFs were assigned identifiers in the form t0001.
The Ty2 sequence contains cross-references to the equivalent genes in the CT18 sequence and is available at GenBank under accession number AE014613.
The genome of Ty2 consists of a single, circular chromosome of 4,791,961 bp with an average G+C content of 52.05%. Figure Figure11 shows the circular genome map of Ty2. Base pair 1 was chosen to correspond to Escherichia coli K-12 minute zero (3). The replication origin and terminus were determined according to their homologies to those in K-12. They are near kb 3750 and 1544, respectively. The C/G distribution switches polarity at both the origin and the terminus. Ty2 has seven rRNA operons, with five copies in one replichore and two in the other (Fig. (Fig.1).1). However, the rRNA operons are located in opposite replichores in Ty2 and CT18 versus K-12 and S. enterica serovar Typhimurium (20, 23, 32). The Ty2 genome is distinguished from that of CT18 by a major interreplichore inversion that spans the terminus of replication and almost half of the Ty2 genome (Fig. (Fig.11 and and2).2). This rearrangement leaves the two replichores slightly uneven in size. The boundaries of the half-genome inversion lie within rRNA operons at the locations of rrnG and rrnH, by comparison with the arrangement in E. coli K-12; both operons were rrnG-rrnH hybrids in Ty2 and were highly likely to have mediated the inversion through homologous recombination (20). Although this recombination did not change the rRNA organization, such genome rearrangements by homologous recombination at rRNA genes often result in distinguishable ribotypes. Variations in ribotypes seem to be linked to host adaptation (32). Besides this major inversion, there is a small inverted region that is also translocated (Fig. (Fig.1,1, green segment in sixth circle).
A comparison of Ty2 with CT18 reveals that more than 98% of the genome sequence is shared in these two S. enterica serovar Typhi strains. Apart from the inversion, the organization of the two genomes is quite similar, unlike the extensive rearrangements seen in a comparison of two Y. pestis strains, KIM and CO92 (7). For Ty2, we predicted 4,339 ORFs, 206 pseudogenes, and 101 RNA genes by comparison with the CT18 annotations (GenBank accession number AL513382). They average 910 bp in length and cover 88% of the genome. While 4,195 ORFs and pseudogenes are identical to those in CT18, 282 differ only in single point mutations from CT18. Together, they account for 97% of the total, a finding consistent with the results of a whole-genome nucleotide comparison. Seven Ty2 ORFs have insertions or deletions relative to CT18 ORFs that are multiples of 3 bp, resulting in longer or shorter proteins without disruption of the reading frames. Ty2 has relatively few insertion sequences (IS), having 26 copies of IS200F and only 3 copies (1 each) of other IS elements: IS1230B, IS285, and IS1351. In addition to these elements, CT18 contains three copies of IS1 (Table (Table11).
There are 29 ORFs unique to Ty2, whereas 84 are unique to CT18 (Table (Table2),2), many of them associated with putative prophages. Like the CT18 genome, the Ty2 genome contains seven regions that are prophage-like. However, they are not all identical. The relationships between the prophage regions of the two genomes are shown by the diagram in Fig. Fig.3.3. Four prophages are present in identical locations in both genomes relative to the adjacent nonphage genes (Table (Table3).3). The prophage integrated near tRNAPhe in Ty2 is probably a remnant prophage; the homologous region in CT18 is not annotated as a prophage in that genome. As shown in Fig. Fig.3,3, two Ty2 prophages are composed of recombined parts of CT18 prophages (or vice versa), and both genomes also have parts of prophages that are unique to those strains (Table (Table1).1). Similar observations for the prophages of O157:H7 suggested that the modular nature of prophage genomes makes a significant contribution to strain variation (24, 27).
Neither the CT18 genome nor the Ty2 genome contains the Gifsy-1 phage identified in S. enterica serovar Typhimurium LT2 (9), and only parts of Gifsy-2 are represented (the leftmost CT18 region in Fig. Fig.3;3; genes t1904 to t1919 in Ty2). In both genomes, these regions are incorporated into putative prophages different from Gifsy-2 and different from each other. Gifsy-2 in LT2 encodes two proteins important for survival in macrophages, SodC (superoxide dismutase) and MsgA (function unknown). In Ty2 and CT18, the Gifsy-like regions do not contain genes for either of these, but both genes are present elsewhere in nonphage regions that are homologous in the two genomes. The previously identified LT2 phages Fels-1 and Fels-2 are not present in Ty2, although there are some regions of homology where Ty2 prophage genes are similar, as is the case for CT18 (22). Virulence-related genes iroBCDEN (Ty2 ORFs t2668 to t2672, encoding an iron uptake and storage system) and sopE (t4303, encoding an invasion-associated, type III-secreted effector protein) are located in cognate prophages in the two genomes.
Like the phage and IS differences, several lineage-specific islands were found by strain comparison (Table (Table1).1). It was not possible to ascertain whether these differences resulted from insertions into one genome or deletions from the other. CT18 ORFs STY0311 and STY0312 are not present in Ty2. The products of these ORFs are not similar to any protein in the database, but the product of STY0312 has characteristics of a secreted protein, as do the products of STY0314 and STY0316 nearby. Since secreted products often interact with extrabacterial targets, these ORFs may be related to pathogenicity. t1446 in Ty2 encodes a homolog of glutaminase (EC.184.108.40.206) that is encoded by yneH in LT2. This enzyme has roles in glutamate and nitrogen metabolism. CT18 has no equivalent gene. A 22.5-kb island at tRNAAsn in Ty2 carries t0871 and t0872, whose products are similar to a restriction enzyme and the C-terminal portion of a modification methylase enzyme of a type III system that probably is not functional, since only 173 out of 525 amino acids (aa) remain. An intact hsdRMS locus encodes the conserved Salmonella and E. coli type I restriction-modification system elsewhere in the genome.
The rRNA loci rrnG and rrnH mediated the large genomic inversion noted above. Recombination between these loci leaves hybrid rrn regions with spacer tRNAs different from those in the analogous K-12 loci, as identified by the DNA sequences flanking the rRNA genes. We found tRNAGlu at Ty2 rrnH (versus tRNAIle-tRNAAla in K-12) and the converse at rrnG. Although CT18 is colinear with K-12 in this part of the genome, it has the same spacers as Ty2 in rrnH and rrnG, suggesting that two events have occurred: an inversion followed by a reinversion restoring colinearity with K-12. Near the origin of replication, intrareplichore inversions in both the CT18 and the Ty2 genomes have resulted in hybrid rrn loci with tRNAIle-tRNAAla in CT18 and tRNAGlu in Ty2 at one hybrid site and the converse at the other site. In this region, the two genomes are colinear, so the difference in spacers again can be explained by a second recombination event. In fact, only rrnB is intact in both genomes. It is interesting that unusual rrn loci with apparently recombined spacer tRNA segments also appear among the annotated genes in Shigella flexneri strain 301 (GenBank accession number AF386526). While these events are admittedly difficult to reconstruct, it is known that rrn-mediated rearrangements are common in S. enterica serovar Typhi (19) and have been used to distinguish Typhi isolates by ribotyping (23).
Perhaps the most significant difference between Ty2 and CT18 is found in the selective silencing of gene functions in the form of pseudogenes. In CT18, there are 204 verified pseudogenes; 9 of these are intact genes in Ty2 (Table (Table4).4). On the other hand, Ty2 has its own unique set of 11 pseudogenes as well as 195 in common with CT18 (Table (Table5).5). Seven Ty2-specific pseudogenes result from frameshifts, while four result from disruptions by point mutations that create internal stop codons. One of the pseudogenes in CT18 (STY2012), which encodes a partial phage recombinase and which is located at a phage insertion point, is entirely lost in Ty2. A Ty2 gene (t0235) that encodes a putative chitinase has an internal stop codon and is annotated as a pseudogene in our Ty2 GenBank entry. Chitinase, related to lysozyme, is an enzyme that disrupts cell membranes by digestion of peptidoglycan linkages. In the corresponding CT18 annotation (ORF STY0257), although the DNA sequence for this gene is identical to that in the Ty2 annotation, it is stated that the stop codon is translationally suppressed by insertion of a tryptophan residue. While there has been a report of this mode of suppression at a low efficiency, we know of no evidence to support it in this particular case.
Table Table66 shows genes in Ty2 that have changes causing their coding frames to diverge from those in CT18. For example, gltX, encoding glutamyl-tRNA synthetase, is intact in Ty2, but in CT18, a frameshifting deletion results in a predicted protein three residues longer at the C terminus. Since gltX is an essential gene, the extra residues presumably do not affect the synthetic activity appreciably. Some of the pseudogenes listed in Table Table44 to to66 also may have a role in pathogenesis. In addition to the loss of 7 out of 12 fimbrial operons in both strains, Ty2 has lost 2 more (stcC and stbC) but has gained fimI. Three other genes that are intact in Ty2 may be associated with pathogenicity. TtrS is a sensor for tetrathionate, an alternative electron acceptor in vitamin B12-dependent anaerobic growth, and may be important for intracellular survival. SopE2 is secreted by a type III mechanism into host cells, where it is involved in actin rearrangements (12), a common first step in bacterial attack or invasion of host cells. WcaA is thought to be a glycosyltransferase involved in the synthesis of colanic acid, which is secreted to form the exopolysaccharide capsule providing protection from, for example, dehydration, acid stress, and osmotic stress, conditions encountered both inside and outside the human host. This capsule is also implicated in another pathogenic mechnism, biofilm formation (6, 28).
Differences between the loci for nitrate reductase Z (NR-Z) and RpoS in Ty2 and CT18 may seem to indicate significant differences in their abilities to thrive in anaerobic conditions, but a detailed investigation yielded more questions than answers. NR-A and NR-Z complexes provide electron transport during anaerobic respiration. CT18 has intact genes in both loci. In Ty2, the NR-A locus is intact, but the Z components encoded by narW and narV are fused by an in-frame deletion. Since both gene products are essential for a functional Z complex, this pathway is probably inactivated in Ty2, although the consequences are unclear. However, S. enterica serovar Typhi mutants deficient in anaerobic respiration are also less capable of intracellular replication (5), a factor which would compromise virulence.
The structure and function of the Z complex are understood, but its role is obscure. The narZYWV genes (product, NR-Z) are homologs of the genes of the NR-A locus, narGHIJ, which are active in anaerobic respiration. The four genes encode the alpha, beta, delta, and gamma components, respectively. By analogy with NR-A, the alpha component contains the catalytic site and requires a molybdenum cofactor for activity. The beta component contains an iron-sulfur center and transfers electrons from the gamma component to the alpha component. Both alpha and beta are cytoplasmic proteins. Paired gamma chains form a membrane anchor, and heme-iron centers are embedded in the membrane. Electrons are accepted by the gamma chains from quinone and then transferred to the beta protein, which in turn transfers them to the alpha subunits, where the reduction reaction is completed. The delta protein is not actually part of the complex but is essential for activity; it is thought to be important in assembling the complex. In Ty2 NarW, the C-terminal 54 aa out of 236 are deleted, and in NarV, aa 1 to 194 out of 225 are lost, including all of the transmembrane segments that form the membrane anchor.
NR-Z was thought to be expressed constitutively at a low level and was known not to be induced during anaerobic growth (11). Recently, the locus in S. enterica serovar Typhimurium was shown to be induced by carbon starvation (stationary phase) and to be RpoS dependent (34). NR-Z is also essential for starvation-induced heat and acid tolerance. NR-Z expression is not induced during anaerobic growth but is actually repressed by Fnr, the nitrogen-sensing regulator. Active hybrid enzymes mixing NR-A and NR-Z subunits have been obtained (2); therefore, it is possible that the NR-A membrane anchor tethers the NR-Z enzymes and that NarJ replaces the mutated NarW, if they are expressed under the same conditions as NR-Z.
However, rpoS is also abnormal in Ty2 (although it is intact in CT18). The alternative sigma factor encoded by this gene regulates more than 30 genes in the stress response. In Ty2, a frameshift at aa 312 replaces the last 12 aa of the wild-type product with 74 different aa. This change was discovered when mutations were introduced into Ty2 in efforts to attenuate its virulence for the creation of a safe vaccine strain, called Ty21a (29) or CVD908 (17). Subsequently, attenuation proved to be due partly to a mutant RpoS which had not been deliberately introduced. The mutation was later shown to be present in the Ty2 parent strain as well (30). Ty21a survived starvation and other stresses poorly, an advantage in a vaccine strain, and these deficiencies were complemented by an introduced wild-type RpoS. Unfortunately, no equivalent data exist for Ty2; therefore, we cannot be sure that other mutations in Ty21a, either engineered or accidental, have influenced the stress response of Ty21a or determine whether there is any direct effect on the virulence of Ty2.
We obtained Ty2 for sequencing from a Salmonella archive to minimize possible effects of repeated passaging. However, variant rpoS genes have been identified in archived stocks of Salmonella and have also been observed in E. coli K-12 W3110 (35). Some spontaneous mutants that arise during prolonged starvation are associated with a growth advantage (38). These observations suggest that mutation of rpoS under extreme stress itself may be a stress response, permitting the selection of a more efficient transcription factor that can improve fitness to respond to adverse conditions.
Interestingly, the genomic region between rpoS and mutS is highly plastic in members of the family Enterobacteriaceae (15), with many rearrangements in this interval. In S. enterica serovar Typhimurium, the expression of the Spv proteins, necessary for bacterial survival inside invaded host cells, is under RpoS control. The Spv genes, carried on a plasmid in serovar Typhimurium, are not present in serovar Typhi, and little is known about how different host tissue invasion mechanisms may be in serovar Typhi or how dependent on RpoS they may be. In fact, the extended RpoS protein in Ty2 retains intact both the RNA polymerase core binding site and the putative DNA binding site that together bring about enhanced binding of the polymerase to the DNA template. Stationary-phase (starvation) responses are under the control of RpoE (a different alternative sigma factor) (36), as are responses to other stimuli important in virulence, such as oxidative stress imposed by macrophage defenses. It is possible that these two transcription factors have some overlapping effects, making it very difficult to predict the effect of this particular mutant RpoS. It is interesting that S. flexneri 2a strain 2457T also has mutations in the narZYWV locus as well as a mutant rpoS (unpublished data). A frameshift changes the C-terminal 30 aa and truncates the transcript, whereas another 2a strain, 301, has also lost narZ but retains rpoS intact. There is much to be learned yet.
Clearly, while genome sequencing has revealed many genes with potentially important contributions to pathogenicity, discovering the details and deciphering the message, if any, in individual sets of pseudogenes will require extensive reseach in many laboratories, each with a specific expertise. A more appropriate model system is also needed. Newer techniques, such as microarray analysis of gene expression, have begun to provide the next level of information about genes involved in pathogenesis. Of the many knockout mutations already constructed in K-12 strain MG1655 in our laboratory, rpoS mutations should be very informative. A set of characterized mutations in an isogenic background should help to unravel the complex aspects of the stress response and provide clues as to how NR-Z may be involved.
Differences in pseudogene content between CT18 and Ty2 fall into no discernible pattern or functional relationship. These differences may have arisen due to variations in stresses applied by human host defense systems and may contribute subtle effects to the complex mechanisms of pathogenesis used by these two strains. They may also reflect a need to adjust the balance of metabolic capabilities to optimize virulence, perhaps achievable by more than one possible combination of genes.When analyzing pseudogenes, as emphasized by McClelland et al. (22), investigators should be aware that pseudogenes may be identified with confidence only when an intact homolog is found in a closely related (and sequenced) genome, and even then, annotation criteria vary. Some, but not all, annotations designate genes disrupted by IS elements as pseudogenes. It is often unclear whether small differences in protein structure, such as the three extra residues in GltX mentioned above, will eliminate function. Criteria based strictly on gene structure are not appropriate for secreted or surface proteins, such as those of fimbriae, whose genes are normally more variable than the conserved genes nearby (27), with the obvious advantage of evading the host immune system or potentially increasing the ability of the bacterium to attach to and affect host tissues.
How can we account for the accumulation of pseudogenes? Although mutagenic processes such as transposase induction are triggered by the kinds of stresses that bacteria undergo in repeated passages in the laboratory, strains that have been used in laboratories for the longest periods, E. coli K-12 and S. enterica serovar Typhimurium strain LT2, have almost 1 order of magnitude fewer pseudogenes than strains of intracellular pathogens that have also been used in laboratories for long periods, Y. pestis KIM, S. flexneri 2457T, and Ty2. The largest numbers of pseudogenes have been observed primarily in host-adapted pathogens that grow intracellularly and are thought to result from an adaptive process (33). In S. enterica serovar Typhi, adaptive changes have limited the host range to humans and (presumably) inactivated metabolic functions that are not needed for intracellular growth (26, 37) or survival in the intestine. These differences, as well as any of the far more numerous differences between S. enterica serovar Typhi and serovar Typhimurium strains, may underlie disease characteristics that are overtly or subtly distinctive of the pathogenic potential of the strains. It is important that each difference is examined with expert knowledge to identify the genetic variables that may yield valuable information through experimental evaluation. This goal may still be costly and difficult to carry out without an animal model that reproduces the human disease accurately and has reasonable costs. Inspection of every pseudogene for the possibility of residual or altered activity is not a trivial task; in many cases, even this initial test is not immediately possible, since researchers may have no idea of the function of encoded proteins or of which parts of encoded proteins may be essential for function.
We thank the University of Wisconsin Genome Sequencing Team members for excellent technical work.
This work was supported by NIH/NIAID grant AI44387 to F.R.B.
†Paper 3604 from the Laboratory of Genetics, University of Wisconsin.