|Home | About | Journals | Submit | Contact Us | Français|
We determined the complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp). Shigella species cause >1 million deaths per year from dysentery and diarrhea and have a lifestyle that is markedly different from those of closely related bacteria, including Escherichia coli. The genome exhibits the backbone and island mosaic structure of E. coli pathogens, albeit with much less horizontally transferred DNA and lacking 357 genes present in E. coli. The strain is distinctive in its large complement of insertion sequences, with several genomic rearrangements mediated by insertion sequences, 12 cryptic prophages, 372 pseudogenes, and 195 S. flexneri-specific genes. The 2457T genome was also compared with that of a recently sequenced S. flexneri 2a strain, 301. Our data are consistent with Shigella being phylogenetically indistinguishable from E. coli. The S. flexneri-specific regions contain many genes that could encode proteins with roles in virulence. Analysis of these will reveal the genetic basis for aspects of this pathogenic organism's distinctive lifestyle that have yet to be explained.
Shigella is an important human pathogen, responsible for the majority of cases of endemic bacillary dysentery prevalent in developing nations. An estimated 1.1 million deaths and 160 million cases per year are attributed to shigellosis (32). Currently, no vaccine is available that can provide adequate protection against the many different serotypes of Shigella. Existing antimicrobial treatments are becoming compromised due to increased antibiotic resistance, cost of treatment, and continuing poor hygiene and unsanitary conditions in the developing world.
Shigella is pathogenic only for humans. It causes disease by invading the epithelium of the colon, resulting in an intense acute inflammatory response (51). Shigella strains are unusual among enteric bacteria in their ability to gain access to the epithelial cell cytosol, where they replicate and spread directly into adjacent cells. Shigella strains contain a large virulence plasmid that is known to encode genes required and sufficient for invasion of epithelial cells (61). However, chromosomal genes present in “pathogenicity islands” also participate in the pathogenic process directly or contribute to survival in the environments encountered during infection (2, 21, 22, 49, 58, 70). The genetic bases for several aspects of the pathogenic process and intracellular lifestyle of Shigella, including the mechanisms of species specificity, tissue tropism, and restriction of the immune response, are still poorly understood (Table (Table1)1) and probably involve chromosomally encoded proteins. In common with other enteric bacteria, Shigella survives the proteases and acids of the intestinal tract by uncertain means. Highly tissue-specific disease results from a very low infectious dose (10 to 100 bacteria) and in the absence of flagellum-based motility. We selected the virulent strain 2457T of Shigella flexneri serotype 2a (33) for sequencing because it has been widely used for genetic research and for clinical challenge studies. Although Shigella spp. have been regarded as distinct from Escherichia coli, as early as 1972, DNA hybridization studies estimated that Shigella and E. coli are taxonomically indistinguishable at the species level (5). Recent work of the Reeves group (34, 56, 57) based on multilocus enzyme electrophoresis and sequencing of a small number of genes places Shigella clearly within the genus Escherichia and arising several times independently. Comparison of the complete S. flexneri genome sequence with that of E. coli K-12 establishes the precise genetic relationship of S. flexneri to E. coli. Given the markedly different lifestyles of intracellular Shigella and extracellular E. coli, the comparison should also reveal important genetic differences expected to underlie pathogenesis, other than the presence or absence of the virulence plasmid.
S. flexneri 2a 2457T was obtained from the Walter Reed Army Institute of Research. The sequenced strain has been redeposited in the American Type Culture Collection under accession no. ATCC 700930.
Bacteria were grown in Luria-Bertani (LB) medium at 37°C, and genomic DNA was prepared by R. A. Welch at the University of Wisconsin. The genomic DNA was released from bacteria embedded in agarose to prevent shearing during preparation (44). Whole-genome libraries in M13Janus (7) and pBluescript KS− (Stratagene) were prepared by using nebulization to randomly shear genomic DNA extracted from agarose by digestion with Gelase (Epicentre) (44). Random clones were sequenced by Applied Biosystems Prism dye-terminator chemistry, and data were collected with ABI377 and 3700 automated sequencers. Sequence reads (66,219 with an average length of 502 nucleotides [nt]) were assembled by Seqman Genome Edition (DNASTAR). Additional PCRs and sequencing reactions were performed to close gaps, improve coverage, and resolve sequence ambiguities. The final coverage was 7.2X. A whole-genome optical map (38) for restriction enzyme XhoI was prepared to aid the ordering of contigs during assembly and so that the end points and lengths of inversions could be confirmed.
Potential open reading frames (ORFs) were defined by GeneMark.hmm (42) or Genequest (DNASTAR). All predicted proteins larger than 30 amino acids were searched against the nonredundant and local databases. tRNAs were identified with tRNAscan-SE (40). Alternative translation start sites were chosen to conform to the annotated MG1655 sequence. Frameshifts and point mutations were carefully verified for authenticity, and disrupted genes with homologs in K-12 were annotated as “pseudogenes.” Predicted backbone proteins were considered to be orthologs when matches to the corresponding K-12 protein exceeded 90% amino acid identity, alignments included at least 90% of both proteins, and no equivalent match was found elsewhere in the 2457T genome. The protein-level matches were also individually inspected to include genes with lower similarities within colinear regions of the genomes. The genome sequence was compared with that of MG1655 by the modified maximal exact match (MEM) alignment utility that was used for the comparison of EDL933 and K-12 (54). The genomic comparison with strain 301 was performed by a new multigenome comparison tool, Mauve.
The complete, annotated sequence was deposited in GenBank under accession no. AE014073.
The genome consists of a single circular chromosome of 4,599,354 bp with a G+C content of 50.9%. Features of the genome and its comparison with E. coli K-12 (4) are shown in Fig. Fig.1.1. Base pair 1 of the chromosome was assigned to correspond with bp 1 in K-12, since the two strains share extensive homology. The origin and terminus of replication were identified within homologous regions. The genome encodes 4,084 predicted genes, with an average size of 873 bp (926 bp if insertion sequences are excluded). The genome is slightly smaller than that of K-12 (4,639,221 bp), and its organization is roughly similar to that described for pathogenic E. coli strain O157:H7 EDL933 (54) and the uropathogen CFT073 (73), with large regions of colinear E. coli backbone punctuated by islands of sequence presumably acquired by horizontal transfer. The number of islands is smaller than those in CFT073 and O157:H7, and a larger proportion of the genome is backbone (82% versus 75% for O157:H7 and CFT073). There are 15 rearrangements >5kb in the genome (inversions and translocations) detected by comparison with K-12 (Fig. (Fig.1).1). Seven rRNA operons are present; their organization was altered from that in K-12 by genomic rearrangements. Ninety-eight tRNA genes include three copies of a novel cluster of four tRNAs (Ile, Arg, Thr, and Gly); only one of these (Gly) is identical to a K-12 tRNA. Each cluster in 2457T is in a prophage region, positioned downstream of the phage Q gene, as in the EDL933 Stx2 phage 933W (55).
Large symmetric chromosomal inversions spanning the replication origin and terminus have been observed when closely related bacterial species are compared (10, 13). The architecture of the S. flexneri genome has been affected by multiple large inversions compared to that of the K-12 genome, mostly spanning the axes of the origin and terminus of replication (inner circles in Fig. Fig.1).1). Additional deletions and unequal crossover events have also taken place, resulting in two replichores of slightly unequal lengths, as found in the genome of Salmonella enterica serovar Typhi strain Ty2 (11). The rearrangement spanning the origin of replication is clearly indicated by the reorganization of the four rRNA operons nearest to it, which have been switched to the other replichore while maintaining their relative locations (shown by a red band in the seventh circle). Figure Figure11 also shows a smaller segment adjacent to the origin, within the larger inversion, that has reinverted without affecting any rRNA loci (shown by a dark blue band adjacent to the origin in seventh circle). Unlike the inter-replichore inversions reported in Yersinia pestis (10), S. enterica serovar Typhi (11, 39), and E. coli K-12 strain W3110 (28), those in S. flexneri are not associated with rRNA homologies, but instead the insertion sequence (IS) elements that are present at most of the inversion ends most probably mediated the chromosomal recombinations.
The S. flexneri chromosome was known to be rich in insertion sequences (45, 53). The IS elements we identified (Table (Table2)2) make up 6.7% (309.4 kb) of the chromosome, in contrast to the typical ranges of 0 to ~4%. The archaeon Sulfolobus solfataricus is a significant exception, because ~10% of its 2.99-Mb genome is composed of ISs, which is unusual even among archea. In the sequenced E. coli genomes, the IS content is <1.5%, and in Y. pestis, the IS content is ~3%. The virulence plasmid of S. flexneri also has an extremely high IS content (53% of the plasmid-encoded genes) (69). Of the 284 IS elements in 2457T, 108 are IS1X1 copies. The intact IS1 elements in this genome are typically families with 98 to 100% nucleotide sequence identity. Forty-six IS1 elements still have detectable flanking direct repeats, indicating recent acquisition (20 are full length, 9 bp; 24 are 8 bp; and 2 are 7 bp), and relatively little amelioration has occurred within these IS1 sequences. Comparative genome analysis with E. coli K-12 showed that 156 IS elements are involved in deletions or inversions associated with backbone rearrangements or with presumed horizontal transfer. The arrangements of several nested clusters of IS indicate that at each cluster, one integrated IS has acted as a target for subsequent insertions, resulting in multiple disrupted elements, with only the most recently acquired IS remaining intact.
Comparison of the S. flexneri and K-12 genome sequences revealed 37 islands >1 kb in the S. flexneri backbone that encode at least one gene not related to transposable elements. In contrast, EDL933 and CFT073 both have more than 100 islands >1 kb. The island ORFs show similarity with proteins in a wide range of organisms, including plant and animal pathogens with variety of lifestyles, indicating acquisition from many different sources (Table (Table3).3). Eight of the 37 S. flexneri islands encode a putative integrase, and seven islands are located at tRNAs: selC, leuX, aspV, asnT, argW, pheV, and glyU. Only four of the islands at tRNA sites include integrases. Unlike YSH600, a 2a serotype from Japan containing fec and resistance loci at serX (41), 2457T has no island at this site; the fec locus is elsewhere and is not associated with antibiotic resistance. Five of the islands show a cryptic prophage-like organization, and apparently there are two prophages together in two of the islands. Five other islands with few phage genes may also be prophage remnants, for a total of 12 putative prophages. All are cryptic, and the larger ones show mosaic structures that could have been produced by recombination between lambdoid phage genomes. In S. flexneri, the genes responsible for serotype conversion (modification of the basic O antigen via glucosylation and/or O acetylation) are encoded by lysogenic bacteriophages. Although in at least one serotype 2b strain, the type II antigen is encoded by an inducible bacteriophage, SfII (47), in 2457T, the serotype conversion genes (gtrAII, gtrBII or bgt, and gtrII) are part of a cryptic prophage disrupted by multiple IS elements and associated genome rearrangements. One island carries the remnant of an integrated plasmid, including arsenate resistance and plasmid replication genes. Islands lacking phage-like genes are generally bounded by IS elements, which have presumably mediated island integration. The predominance of matches to O157:H7 proteins (Table (Table3)3) probably reflects the contents of GenBank rather than suggesting a particularly close relationship between 2457T and O157:H7.
S. flexneri was known to harbor a large virulence plasmid, which contains all of the genes required to express the invasive phenotype (61), and two small multicopy plasmids. We sequenced all three plasmids from strain 2457T: pINV-2457T (218 kb), pSf2, and pSf4. We compared the sequence of pINV to those of three S. flexneri virulence plasmids: pWR100 (GenBank accession no. AL391753), pWR501 (AF348706), and pCP301 (AF386526). The results showed that they are all essentially identical, with a few IS element differences and ~150 single-nucleotide differences distinguishing them. In the course of assembling the genome sequence of S. flexneri 2457T, we also unexpectedly identified a fourth plasmid of 165 kb. This was an S. enterica serovar Typhi R27-like plasmid, which we named “pSf-R27.” The R27 plasmid (62) was thought to be limited to Salmonella, in which it is implicated in the accumulation and spread of antibiotic resistance, but more recently, the similarity noted between R27 and pMT1, the large virulence plasmid of Y. pestis, suggested that there may have been a common ancestral plasmid. Sequence comparison showed that in pSf-R27, Tn10 (carrying tetracycline resistance genes), IS30, and a citrate uptake locus are absent, while the rest of the plasmid is 99.7% identical to R27. PCR was used to screen 142 S. flexneri isolates, including 57 of serotype 2a, for R27 sequences. The sequenced strain, 2457T, was the only strain to give a positive result. 2457T isolates from two other research groups that had obtained the strain from the same source were screened; the plasmid was found in one but not the other. Since 2457T was originally isolated before antibiotic usage had become widespread, it is possible that pSf-R27 may represent a primordial state of the R plasmid subsequently lost from the negative isolate, although we cannot formally exclude the possibility that pSf-R27 was accidentally introduced shortly after the strain was first isolated.
While islands represent insertions into the S. flexneri genome, there are also a large number of gene disruptions and deletions. Disruptions resulted in 372 pseudogenes (8.1% of the genome), caused by several mechanisms, including single-nucleotide indels, point mutations, and IS elements. (IS alone accounts for 27 disruptions and 85 truncations.) Larger IS-mediated deletions and insertions are also seen. In total, 879 genes of K-12 are either absent or are pseudogenes in S. flexneri. Many types of function are missing (Table (Table4).4). The missing function is sometimes supplied by a plasmid- or island-encoded gene. The chromosomal fepE is a pseudogene; FepE is a homolog of Cld in K-12, encoding an O-antigen chain-elongation factor. An intact homolog is found on one of the small multicopy S. flexneri plasmids, and this FepE function is required for virulence (23, 65). Similarly, the mhp operon of K-12 is involved in catabolism of small aromatic molecules. Although it is missing from S. flexneri, an alternative system with similar activity is encoded by the hpa locus present on an island. This locus is also found in E. coli C and W and Y. pestis, but not K-12. K-12 genes missing from the S. flexneri backbone are clustered in K-12, suggesting either a single deletion event for each group in S. flexneri or their absence from a common ancestor, with later acquisition by K-12 via horizontal transfer. As an example, the island at tRNA leuX is completely different in K-12, EDL933, CFT073, and 2457T. Clearly, the four strains acquired these islands by distinct events, even if some could have been replacements rather than insertions. Phenotypic tests that have been widely used to distinguish E. coli from S. flexneri are largely explained by pseudogenes, which account for loss of flagellar motility; utilization of mucate, acetate, various sugars, and glycerol; and the requirement for NAD.
Despite their differences, there persists a high level of similarity among S. flexneri, K-12, and O157:H7. We show in Fig. Fig.22 that the intact proteins shared by all three strains make up by far the largest category. In contrast, few proteins are shared by S. flexneri and O157:H7 but not K-12, demonstrating that the shared colinear backbone is the underlying feature connecting these genomes. The extensive backbone regions we identified in S. flexneri are consistent with phylogenetic reconstructions placing it among the members of the genus Escherichia (56, 57, 71). To examine the predicted proteins on a global scale, we compared backbone proteins in common among S. flexneri, O157:H7, and S. enterica serovars Typhi and Typhimurium (Fig. (Fig.3),3), and these results clearly show that S. flexneri and E. coli are indistinguishable, but quite distinct from the two Salmonella strains, supporting Reeves' suggestion that new nomenclature should be adopted to more accurately reflect the phylogeny (71).
At the same time this paper was submitted, the genome sequence of S. flexneri strain 301 was published (25) under GenBank accession no. AE005674. This strain was isolated in 1984 from a patient in China, providing an interesting genome of the same serotype but geographically and temporally separated from 2457T. We compared the genome sequences and annotated features with those of 2457T. The genome of strain 301 is 4,607,203 bp, 7.85 kb larger than 2457T, which is largely accounted for by differences in IS complement, of which strain 301 has 247 complete and 6 partial ISs, whereas 2457T has 242 complete and 42 partial ISs. There are 45 IS loci that are different between the two strains. The genome sequences are very similar, but there are more than 1,400 single-nucleotide differences between them, scattered throughout. We found no evidence in 2457T for the unusual set of three spacer tRNAs (tRNAGlu, as well as tRNAIle and tRNAAla) in the rrnH operon in strain 301, and no example of this type appears in the RNA spacer region database (19). The spacer tRNAs also differ from those in K-12 and 2457T in the rrnA, rrnD, and rrnG operons.
The genome of 2457T shows rearrangements relative to strain 301 (Fig. (Fig.4)4) as well as, and distinct from, those relative to K-12. Around the origin of replication, strain 301 is colinear with K-12, whereas 2457T is not. Around the terminus, a large inversion in 2457T relative to strain 301 was followed by reinversion of most of the DNA within the rearrangement (Fig. (Fig.4),4), leaving two small patches of inverted sequence marking the end points of the initial event. These recombinations were apparently mediated by IS elements.
The island contents are similar in the two strains, but some islands show a different organization. Examples are found in the island containing the sitABCD genes and the islands at leuX and thrW. In 2457T, the sit island is integrated at tRNAGly, one of the novel tRNAs, but in strain 301, these tRNAs are distant, due to the rearrangement around the terminus. The thrW island is a serotype-converting prophage that contains several extra unknown genes in strain 301. The leuX island in 2457T contains the cadB pseudogene, which is not present in strain 301. No small multicopy plasmids were reported for strain 301.
The annotated strain 301 genome sequence shows 254 pseudogenes, compared with 372 pseudogenes in 2457T. Some of these differences are due to individual annotation criteria and styles, but 159 are pseudogenes in both strains, of which 42 have unknown functions. Each strain has its own unique set of pseudogenes. Those with known or predicted functions are listed in Table Table5:5: 100 pseudogenes in 2457T and 20 in strain 301. The significance, if any, of the backbone and pseudogene complements of the two strains remains unclear.
Even though our current knowledge of S. flexneri pathogenesis is detailed in some respects, much remains to be discovered (Table (Table1).1). Genome analysis provides important clues to linking processes with specific genes and products. For example, 10 islands may be involved in niche-specific processes or virulence. Some have been analyzed previously and shown to contribute to virulence, including Pic mucinase and ShET1 enterotoxin (2), as well as the aerobactin siderophore genes in SHI-2 at selC (70) (SHI-3 at pheU in S. boydii) (58). Eight other smaller and previously uncharacterized islands contain iron uptake and utilization clusters and putative adhesins. One contains the sit genes, encoding an iron uptake system. With the S. flexneri 2a sequence used as a probe, the sit genes were found in all S. flexneri species and enteroinvasive E. coli, but not in the other pathogenic and nonpathogenic E. coli strains examined (60). Expression of the Sit proteins is induced in the intracellular environment (60). Thus, the Sit system may play an important role in iron sequestration in the intracellular environment of the host. Also encoded in islands are possible specific adhesins, similar to components (LpfA and -C) of long polar fimbriae in S. enterica serovar Typhimurium. Others resemble the Saf proteins of S. enterica serovar Typhimurium.
The IpaH proteins encoded on the virulence plasmid of S. flexneri (8, 68) consist of a conserved C-terminal domain and a variable N terminus containing a leucine-rich repeat (LRR). They are secreted by the plasmid type III secretion system, and at least one (IpaH7.8) has been shown to aid S. flexneri in escaping from the macrophage vacuole and is considered to be a virulence factor (16). There are five copies of ipaH on the plasmid, and we found seven more in the 2457T genome, of which four are intact, containing both the LRR domain and the conserved region. The genome sequence of strain 301 (25) also revealed four complete and three incomplete genomic copies. Figure Figure55 illustrates the differences between the ipaH genes in the two genomes. In both genomes, the incomplete copies are disrupted by insertion sequences or frameshifting mutations. One of the incomplete 2457T copies is highly divergent from all of the other ipaH genes.
The genome sequence of S. flexneri offers new candidate genes with potential for involvement in pathogenicity, including predicted proteins similar to virulence factors in other organisms. Among these data, missing links in S. flexneri pathogenesis may be found (Table (Table1).1). For example, the molecular mechanisms of species and tissue tropism, including the adhesins potentially specific for the human colonic epithelium, remain hidden. This is due in part to the lack of a suitable animal model. Mice do not become infected following oral inoculation of S. flexneri; therefore, mouse models have been restricted to pulmonary and conjunctival infections, which differ in important respects from colonic infection. Among the island ORFs of S. flexneri are 7 that are similar to adhesins from other pathogenic organisms and 68 that lack significant similarity to proteins of known function, including 9 predicted to encode secreted or membrane proteins, which are therefore strong candidates for mediating direct interactions with host cells. The unique complement of fimbrial adhesins in S. flexneri presumably underlies host specificity, as has been suggested for S. enterica serovar Typhi, another exclusively human pathogen (66). Of particular interest are the ORFs similar to the Salmonella SafABC. As S. enterica serovar Typhimurium is also an intracellular pathogen of intestinal epithelial cells and macrophages, this locus may encode components of an adhesin contributing to host or tissue specificity. In addition, ORFs S3961 and S4048 encode a major type 1 fimbrial subunit and usher protein essentially identical to proteins of enterohemorrhagic E. coli O113:H21, which is pathogenic for humans and cattle.
While a specific host cell receptor may not be the only valid explanation for host specificity, it is consistent with experimental data and in vivo observations. We emphasize that there are clear differences among the consequences of infection of cultured mammalian cells and inoculation of mice or humans. When grown as a nonpolarized and nonconfluent monolayer, cells from a wide variety of hosts and anatomic origins are readily invaded by S. flexneri. When grown as a polarized and confluent monolayer, S. flexneri invades cells only at the basolateral membrane (50). However, in the context of an intact animal host, only cells of the human or monkey colonic mucosa or mouse respiratory epithelium have been shown to be infected by S. flexneri. S. flexneri strains have not been shown to cause intestinal disease in nonprimates, and in mice, S. flexneri strains appear not to invade the colonic mucosa (M. B. Goldberg, unpublished data). Thus, while alternative explanations of S. flexneri species and tissue specificity exist, a specific receptor on polarized primate colonic cells might be involved in the specific invasion of this tissue. In particular, such a receptor might be important to S. flexneri gaining access to the basolateral sides of these cells.
Expression of receptor candidate proteins in nonpathogenic E. coli and screening for adherence to appropriate human tissue (24) might then allow the unique human cellular receptor to be identified (36). From there, the construction of a transgenic mouse model for S. flexneri infection is possible, as reported for Listeria monocytogenes (37), another human-specific intestinal pathogen that causes disease in humans but not mice. An improved animal model will greatly facilitate evaluation of candidate genes with possible roles in virulence.
Experimental evidence suggests that IpaH proteins may play a role in modulating the host response to infection. IpaH7.8 on the invasion plasmid was shown to help S. flexneri escape from macrophage vacuoles (16). Mutations in two ipaH genes on the invasion plasmid induce an exaggerated keratoconjunctivitis response with greater-than-normal inflammation in guinea pig eyes, and IpaH9.8 encoded on the plasmid was shown to translocate to the host nuclei in tissue culture cells (67), but the precise functions of these proteins remain unknown. Unlike the ipaH genes on the invasion plasmid, the genome-encoded ipaH genes are mostly associated with prophage-like islands, reminiscent of the Salmonella lambda-like Gifsy prophages, which encode effector proteins of the YopM/IpaH family (48). Lysogenic conversion with these phages is responsible for much of the diversity of the effector protein repertoires observed among Salmonella spp. (48). The finding that ipaH genes on the plasmid and chromosome may show strain-specific differences in sequences is a novel observation and might suggest that, like in Salmonella, the ipaH gene family might contribute to diversity of effector molecules. This remains to be tested.
IpaH proteins belong to the superfamily of LRR-containing proteins, which includes members from bacteria, plants, and vertebrates (6, 27). The conservation level of these proteins indicates that the LRR probably has structural or functional significance. IpaH-like proteins are found in the animal pathogens Salmonella, Yersinia, and Listeria, as well as the plant pathogens Rhizobium, Bradyrhizobium, and Ralstonia, again often associated with prophage (9, 18, 20, 35). In many host organisms, including plants, receptors involved in recognizing invading pathogens are also LRR proteins: for example, mammalian Toll-like receptors and the NB/LRR family in plants (1, 26). Experimental evidence accumulating from various studies of host-pathogen interactions is beginning to suggest that the bacterial effector proteins might interfere with or modulate the host receptor activity, presumably enabling the pathogen to evade the host's defensive response.
Acquisition of new traits by horizontal transfer has enabled microorganisms to survive in new niches. A complementary loss-of-function mechanism has been proposed (52, 64) by which virulence is enhanced through mutation of ancestral genes encoding factors that interfere with the expression or function of traits necessary for success in the new environment. Acquisition of the virulence plasmid enabled S. flexneri to enter the highly specialized intracellular environment in human intestinal epithelial cells. In this new niche, genes that were required in the intestinal lumen may be deleterious or are no longer beneficial and may accumulate mutations without a selective force to maintain them. Lysine decarboxylase (CadA) produces cadaverine, which inhibits the escape of S. flexneri from the vacuole into the cell cytosol (15, 46). Since S. flexneri replication and spread are dependent upon its access to the cytosol, biosynthesis of cadaverine attenuates virulence. In 2457T, cadA and cadC, which encodes a transcriptional activator of the cad operon, are deleted (entirely absent from the genome). Lack of surface structures such as flagella, fimbriae, and curli in S. flexneri provides the advantage of fewer antigens that can be easily recognized by the host immune system. In 2457T, of 14 dysfunctional genes of flagellar biosynthesis, 11 (fliF, fliJ, fliP, flgC, flgE, flgF, flgK, flgL, flhA, flhB, and cheR) contain frameshifts and 1 (fliA) contains a point mutation, while IS1 elements truncate flhD and flhE.
Although invasion and intercellular spread are well studied (51), many of the signaling and gene expression controls that orchestrate these processes are unknown (Table (Table1)1) and might provide new points of therapeutic intervention. Although S. flexneri is an intracellular pathogen, adaptive immunity to S. flexneri may be restricted to B-lymphocyte-dependent humoral responses. Human adaptive immunity is serotype specific, and exposure induces production of specific immunoglobulins (17, 59). In mouse models, adaptive immunity is completely independent of T-lymphocyte function (72). However, the mechanism by which S. flexneri modulates T-lymphocyte responses is unknown. With the sequence known, gene chips could now be used to interrogate expression profiles during infection, identifying all of the genes responding to the various changing conditions of particular interest, including oxidation, temperature shift, and iron depletion, which are specifically induced in the intracellular environment.
The high incidence of shigellosis and the proliferation of drug resistance have spurred serious efforts in vaccine development. Some success has been reported with live attenuated bacteria with mutations in the plasmid gene virG (necessary for intercellular spread), both alone and in combination with chromosomal deletions of aroA (aromatic amino acid synthesis), iuc (aerobactin), set (enterotoxin), or guaBA (purine biosynthesis pathway) (29-31). New candidate genes, when characterized, will provide alternative routes to further attenuation while maintaining antigenicity.
Because of its ability to enter into the cytosol of mammalian cells, S. flexneri strains have been developed as a delivery vehicle of antigens to major histocompatibility complex class I for immunization or of DNA into target cells for gene therapy (3, 12, 14, 63). Again, optimization of these approaches will require sufficient attenuation of the S. flexneri vehicle, specific binding to target cells, and controlled modulation of the immune response.
Knowledge of all the proteins encoded in the 2457T genome provides the entire repertoire of surface proteins that are potential vaccine targets, and candidates found to be adequately antigenic could therefore be used singly or in combination, engineered for expression from recombinant constructs, or even used directly in DNA vaccines. The sequence will also facilitate identification of many of the corresponding vaccine candidate genes in other S. flexneri serotypes, both type specific or in common. Comparison with the genome of nonpathogenic E. coli will reveal factors that, like cadaverine, block or limit survival of S. flexneri in host tissue. Thus, functions no longer active (pseudogenes) in S. flexneri but expressed in nonpathogenic E. coli may lead to the development of novel S. flexneri-specific therapies by virtue of a suppressive effect on bacterial growth or tissue invasion. These genome-driven research activities will serve as starting points for a new phase of vaccine and molecular pathogenicity investigation.
We thank the members of the University of Wisconsin genomics team for expert technical assistance.
This work was supported by Public Health Service grants AI-44387 to F.R.B and AI-43562 to M.B.G.
J.W., M.B.G., and V.B. contributed equally to this work.
Editor: J. T. Barbieri
†Paper no. 3603 from the Laboratory of Genetics.