The genome of Ty2 consists of a single, circular chromosome of 4,791,961 bp with an average G+C content of 52.05%. Figure shows the circular genome map of Ty2. Base pair 1 was chosen to correspond to
Escherichia coli K-12 minute zero (
3). The replication origin and terminus were determined according to their homologies to those in K-12. They are near kb 3750 and 1544, respectively. The C/G distribution switches polarity at both the origin and the terminus. Ty2 has seven rRNA operons, with five copies in one replichore and two in the other (Fig. ). However, the rRNA operons are located in opposite replichores in Ty2 and CT18 versus K-12 and
S.
enterica serovar Typhimurium (
20,
23,
32). The Ty2 genome is distinguished from that of CT18 by a major interreplichore inversion that spans the terminus of replication and almost half of the Ty2 genome (Fig. and ). This rearrangement leaves the two replichores slightly uneven in size. The boundaries of the half-genome inversion lie within rRNA operons at the locations of
rrnG and
rrnH, by comparison with the arrangement in
E.
coli K-12; both operons were
rrnG-rrnH hybrids in Ty2 and were highly likely to have mediated the inversion through homologous recombination (
20). Although this recombination did not change the rRNA organization, such genome rearrangements by homologous recombination at rRNA genes often result in distinguishable ribotypes. Variations in ribotypes seem to be linked to host adaptation (
32). Besides this major inversion, there is a small inverted region that is also translocated (Fig. , green segment in sixth circle).
A comparison of Ty2 with CT18 reveals that more than 98% of the genome sequence is shared in these two
S.
enterica serovar Typhi strains. Apart from the inversion, the organization of the two genomes is quite similar, unlike the extensive rearrangements seen in a comparison of two
Y.
pestis strains, KIM and CO92 (
7). For Ty2, we predicted 4,339 ORFs, 206 pseudogenes, and 101 RNA genes by comparison with the CT18 annotations (GenBank accession number
AL513382). They average 910 bp in length and cover 88% of the genome. While 4,195 ORFs and pseudogenes are identical to those in CT18, 282 differ only in single point mutations from CT18. Together, they account for 97% of the total, a finding consistent with the results of a whole-genome nucleotide comparison. Seven Ty2 ORFs have insertions or deletions relative to CT18 ORFs that are multiples of 3 bp, resulting in longer or shorter proteins without disruption of the reading frames. Ty2 has relatively few insertion sequences (IS), having 26 copies of IS
200F and only 3 copies (1 each) of other IS elements: IS
1230B, IS
285, and IS
1351. In addition to these elements, CT18 contains three copies of IS
1 (Table ).
Ty2 prophages. There are 29 ORFs unique to Ty2, whereas 84 are unique to CT18 (Table ), many of them associated with putative prophages. Like the CT18 genome, the Ty2 genome contains seven regions that are prophage-like. However, they are not all identical. The relationships between the prophage regions of the two genomes are shown by the diagram in Fig. . Four prophages are present in identical locations in both genomes relative to the adjacent nonphage genes (Table ). The prophage integrated near tRNA
Phe in Ty2 is probably a remnant prophage; the homologous region in CT18 is not annotated as a prophage in that genome. As shown in Fig. , two Ty2 prophages are composed of recombined parts of CT18 prophages (or vice versa), and both genomes also have parts of prophages that are unique to those strains (Table ). Similar observations for the prophages of O157:H7 suggested that the modular nature of prophage genomes makes a significant contribution to strain variation (
24,
27).
| TABLE 3.Phage regions similar in Ty2 and CT18 |
Neither the CT18 genome nor the Ty2 genome contains the Gifsy-1 phage identified in
S.
enterica serovar Typhimurium LT2 (
9), and only parts of Gifsy-2 are represented (the leftmost CT18 region in Fig. ; genes t1904 to t1919 in Ty2). In both genomes, these regions are incorporated into putative prophages different from Gifsy-2 and different from each other. Gifsy-2 in LT2 encodes two proteins important for survival in macrophages, SodC (superoxide dismutase) and MsgA (function unknown). In Ty2 and CT18, the Gifsy-like regions do not contain genes for either of these, but both genes are present elsewhere in nonphage regions that are homologous in the two genomes. The previously identified LT2 phages Fels-1 and Fels-2 are not present in Ty2, although there are some regions of homology where Ty2 prophage genes are similar, as is the case for CT18 (
22). Virulence-related genes
iroBCDEN (Ty2 ORFs t2668 to t2672, encoding an iron uptake and storage system) and
sopE (t4303, encoding an invasion-associated, type III-secreted effector protein) are located in cognate prophages in the two genomes.
Other differences. Like the phage and IS differences, several lineage-specific islands were found by strain comparison (Table ). It was not possible to ascertain whether these differences resulted from insertions into one genome or deletions from the other. CT18 ORFs STY0311 and STY0312 are not present in Ty2. The products of these ORFs are not similar to any protein in the database, but the product of STY0312 has characteristics of a secreted protein, as do the products of STY0314 and STY0316 nearby. Since secreted products often interact with extrabacterial targets, these ORFs may be related to pathogenicity. t1446 in Ty2 encodes a homolog of glutaminase (EC.3.5.1.2) that is encoded by yneH in LT2. This enzyme has roles in glutamate and nitrogen metabolism. CT18 has no equivalent gene. A 22.5-kb island at tRNAAsn in Ty2 carries t0871 and t0872, whose products are similar to a restriction enzyme and the C-terminal portion of a modification methylase enzyme of a type III system that probably is not functional, since only 173 out of 525 amino acids (aa) remain. An intact hsdRMS locus encodes the conserved Salmonella and E. coli type I restriction-modification system elsewhere in the genome.
The rRNA loci
rrnG and
rrnH mediated the large genomic inversion noted above. Recombination between these loci leaves hybrid
rrn regions with spacer tRNAs different from those in the analogous K-12 loci, as identified by the DNA sequences flanking the rRNA genes. We found tRNA
Glu at Ty2
rrnH (versus tRNA
Ile-tRNA
Ala in K-12) and the converse at
rrnG. Although CT18 is colinear with K-12 in this part of the genome, it has the same spacers as Ty2 in
rrnH and
rrnG, suggesting that two events have occurred: an inversion followed by a reinversion restoring colinearity with K-12. Near the origin of replication, intrareplichore inversions in both the CT18 and the Ty2 genomes have resulted in hybrid
rrn loci with tRNA
Ile-tRNA
Ala in CT18 and tRNA
Glu in Ty2 at one hybrid site and the converse at the other site. In this region, the two genomes are colinear, so the difference in spacers again can be explained by a second recombination event. In fact, only
rrnB is intact in both genomes. It is interesting that unusual
rrn loci with apparently recombined spacer tRNA segments also appear among the annotated genes in
Shigella flexneri strain 301 (GenBank accession number
AF386526). While these events are admittedly difficult to reconstruct, it is known that
rrn-mediated rearrangements are common in
S.
enterica serovar Typhi (
19) and have been used to distinguish Typhi isolates by ribotyping (
23).
Differences among pseudogenes in CT18 and Ty2. Perhaps the most significant difference between Ty2 and CT18 is found in the selective silencing of gene functions in the form of pseudogenes. In CT18, there are 204 verified pseudogenes; 9 of these are intact genes in Ty2 (Table ). On the other hand, Ty2 has its own unique set of 11 pseudogenes as well as 195 in common with CT18 (Table ). Seven Ty2-specific pseudogenes result from frameshifts, while four result from disruptions by point mutations that create internal stop codons. One of the pseudogenes in CT18 (STY2012), which encodes a partial phage recombinase and which is located at a phage insertion point, is entirely lost in Ty2. A Ty2 gene (t0235) that encodes a putative chitinase has an internal stop codon and is annotated as a pseudogene in our Ty2 GenBank entry. Chitinase, related to lysozyme, is an enzyme that disrupts cell membranes by digestion of peptidoglycan linkages. In the corresponding CT18 annotation (ORF STY0257), although the DNA sequence for this gene is identical to that in the Ty2 annotation, it is stated that the stop codon is translationally suppressed by insertion of a tryptophan residue. While there has been a report of this mode of suppression at a low efficiency, we know of no evidence to support it in this particular case.
| TABLE 4.Genes that were pseudogenes in CT18a but intact in Ty2 |
| TABLE 5.Genes that were pseudogenes in Ty2 but intact in CT18 |
Table shows genes in Ty2 that have changes causing their coding frames to diverge from those in CT18. For example,
gltX, encoding glutamyl-tRNA synthetase, is intact in Ty2, but in CT18, a frameshifting deletion results in a predicted protein three residues longer at the C terminus. Since
gltX is an essential gene, the extra residues presumably do not affect the synthetic activity appreciably. Some of the pseudogenes listed in Table to also may have a role in pathogenesis. In addition to the loss of 7 out of 12 fimbrial operons in both strains, Ty2 has lost 2 more (
stcC and
stbC) but has gained
fimI. Three other genes that are intact in Ty2 may be associated with pathogenicity. TtrS is a sensor for tetrathionate, an alternative electron acceptor in vitamin B
12-dependent anaerobic growth, and may be important for intracellular survival. SopE2 is secreted by a type III mechanism into host cells, where it is involved in actin rearrangements (
12), a common first step in bacterial attack or invasion of host cells. WcaA is thought to be a glycosyltransferase involved in the synthesis of colanic acid, which is secreted to form the exopolysaccharide capsule providing protection from, for example, dehydration, acid stress, and osmotic stress, conditions encountered both inside and outside the human host. This capsule is also implicated in another pathogenic mechnism, biofilm formation (
6,
28).
| TABLE 6.ORFs whose products have divergent C-terminal sequences |
NR-Z and RpoS. Differences between the loci for nitrate reductase Z (NR-Z) and RpoS in Ty2 and CT18 may seem to indicate significant differences in their abilities to thrive in anaerobic conditions, but a detailed investigation yielded more questions than answers. NR-A and NR-Z complexes provide electron transport during anaerobic respiration. CT18 has intact genes in both loci. In Ty2, the NR-A locus is intact, but the Z components encoded by
narW and
narV are fused by an in-frame deletion. Since both gene products are essential for a functional Z complex, this pathway is probably inactivated in Ty2, although the consequences are unclear. However,
S.
enterica serovar Typhi mutants deficient in anaerobic respiration are also less capable of intracellular replication (
5), a factor which would compromise virulence.
The structure and function of the Z complex are understood, but its role is obscure. The narZYWV genes (product, NR-Z) are homologs of the genes of the NR-A locus, narGHIJ, which are active in anaerobic respiration. The four genes encode the alpha, beta, delta, and gamma components, respectively. By analogy with NR-A, the alpha component contains the catalytic site and requires a molybdenum cofactor for activity. The beta component contains an iron-sulfur center and transfers electrons from the gamma component to the alpha component. Both alpha and beta are cytoplasmic proteins. Paired gamma chains form a membrane anchor, and heme-iron centers are embedded in the membrane. Electrons are accepted by the gamma chains from quinone and then transferred to the beta protein, which in turn transfers them to the alpha subunits, where the reduction reaction is completed. The delta protein is not actually part of the complex but is essential for activity; it is thought to be important in assembling the complex. In Ty2 NarW, the C-terminal 54 aa out of 236 are deleted, and in NarV, aa 1 to 194 out of 225 are lost, including all of the transmembrane segments that form the membrane anchor.
NR-Z was thought to be expressed constitutively at a low level and was known not to be induced during anaerobic growth (
11). Recently, the locus in
S.
enterica serovar Typhimurium was shown to be induced by carbon starvation (stationary phase) and to be RpoS dependent (
34). NR-Z is also essential for starvation-induced heat and acid tolerance. NR-Z expression is not induced during anaerobic growth but is actually repressed by Fnr, the nitrogen-sensing regulator. Active hybrid enzymes mixing NR-A and NR-Z subunits have been obtained (
2); therefore, it is possible that the NR-A membrane anchor tethers the NR-Z enzymes and that NarJ replaces the mutated NarW, if they are expressed under the same conditions as NR-Z.
However,
rpoS is also abnormal in Ty2 (although it is intact in CT18). The alternative sigma factor encoded by this gene regulates more than 30 genes in the stress response. In Ty2, a frameshift at aa 312 replaces the last 12 aa of the wild-type product with 74 different aa. This change was discovered when mutations were introduced into Ty2 in efforts to attenuate its virulence for the creation of a safe vaccine strain, called Ty21a (
29) or CVD908 (
17). Subsequently, attenuation proved to be due partly to a mutant RpoS which had not been deliberately introduced. The mutation was later shown to be present in the Ty2 parent strain as well (
30). Ty21a survived starvation and other stresses poorly, an advantage in a vaccine strain, and these deficiencies were complemented by an introduced wild-type RpoS. Unfortunately, no equivalent data exist for Ty2; therefore, we cannot be sure that other mutations in Ty21a, either engineered or accidental, have influenced the stress response of Ty21a or determine whether there is any direct effect on the virulence of Ty2.
We obtained Ty2 for sequencing from a
Salmonella archive to minimize possible effects of repeated passaging. However, variant
rpoS genes have been identified in archived stocks of
Salmonella and have also been observed in
E.
coli K-12 W3110 (
35). Some spontaneous mutants that arise during prolonged starvation are associated with a growth advantage (
38). These observations suggest that mutation of
rpoS under extreme stress itself may be a stress response, permitting the selection of a more efficient transcription factor that can improve fitness to respond to adverse conditions.
Interestingly, the genomic region between
rpoS and
mutS is highly plastic in members of the family
Enterobacteriaceae (
15), with many rearrangements in this interval. In
S.
enterica serovar Typhimurium, the expression of the Spv proteins, necessary for bacterial survival inside invaded host cells, is under RpoS control. The Spv genes, carried on a plasmid in serovar Typhimurium, are not present in serovar Typhi, and little is known about how different host tissue invasion mechanisms may be in serovar Typhi or how dependent on RpoS they may be. In fact, the extended RpoS protein in Ty2 retains intact both the RNA polymerase core binding site and the putative DNA binding site that together bring about enhanced binding of the polymerase to the DNA template. Stationary-phase (starvation) responses are under the control of RpoE (a different alternative sigma factor) (
36), as are responses to other stimuli important in virulence, such as oxidative stress imposed by macrophage defenses. It is possible that these two transcription factors have some overlapping effects, making it very difficult to predict the effect of this particular mutant RpoS. It is interesting that
S.
flexneri 2a strain 2457T also has mutations in the
narZYWV locus as well as a mutant
rpoS (unpublished data). A frameshift changes the C-terminal 30 aa and truncates the transcript, whereas another 2a strain, 301, has also lost
narZ but retains
rpoS intact. There is much to be learned yet.
Clearly, while genome sequencing has revealed many genes with potentially important contributions to pathogenicity, discovering the details and deciphering the message, if any, in individual sets of pseudogenes will require extensive reseach in many laboratories, each with a specific expertise. A more appropriate model system is also needed. Newer techniques, such as microarray analysis of gene expression, have begun to provide the next level of information about genes involved in pathogenesis. Of the many knockout mutations already constructed in K-12 strain MG1655 in our laboratory, rpoS mutations should be very informative. A set of characterized mutations in an isogenic background should help to unravel the complex aspects of the stress response and provide clues as to how NR-Z may be involved.
Differences in pseudogene content between CT18 and Ty2 fall into no discernible pattern or functional relationship. These differences may have arisen due to variations in stresses applied by human host defense systems and may contribute subtle effects to the complex mechanisms of pathogenesis used by these two strains. They may also reflect a need to adjust the balance of metabolic capabilities to optimize virulence, perhaps achievable by more than one possible combination of genes.When analyzing pseudogenes, as emphasized by McClelland et al. (
22), investigators should be aware that pseudogenes may be identified with confidence only when an intact homolog is found in a closely related (and sequenced) genome, and even then, annotation criteria vary. Some, but not all, annotations designate genes disrupted by IS elements as pseudogenes. It is often unclear whether small differences in protein structure, such as the three extra residues in GltX mentioned above, will eliminate function. Criteria based strictly on gene structure are not appropriate for secreted or surface proteins, such as those of fimbriae, whose genes are normally more variable than the conserved genes nearby (
27), with the obvious advantage of evading the host immune system or potentially increasing the ability of the bacterium to attach to and affect host tissues.
How can we account for the accumulation of pseudogenes? Although mutagenic processes such as transposase induction are triggered by the kinds of stresses that bacteria undergo in repeated passages in the laboratory, strains that have been used in laboratories for the longest periods,
E.
coli K-12 and
S.
enterica serovar Typhimurium strain LT2, have almost 1 order of magnitude fewer pseudogenes than strains of intracellular pathogens that have also been used in laboratories for long periods,
Y.
pestis KIM,
S.
flexneri 2457T, and Ty2. The largest numbers of pseudogenes have been observed primarily in host-adapted pathogens that grow intracellularly and are thought to result from an adaptive process (
33). In
S.
enterica serovar Typhi, adaptive changes have limited the host range to humans and (presumably) inactivated metabolic functions that are not needed for intracellular growth (
26,
37) or survival in the intestine. These differences, as well as any of the far more numerous differences between
S.
enterica serovar Typhi and serovar Typhimurium strains, may underlie disease characteristics that are overtly or subtly distinctive of the pathogenic potential of the strains. It is important that each difference is examined with expert knowledge to identify the genetic variables that may yield valuable information through experimental evaluation. This goal may still be costly and difficult to carry out without an animal model that reproduces the human disease accurately and has reasonable costs. Inspection of every pseudogene for the possibility of residual or altered activity is not a trivial task; in many cases, even this initial test is not immediately possible, since researchers may have no idea of the function of encoded proteins or of which parts of encoded proteins may be essential for function.