|Home | About | Journals | Submit | Contact Us | Français|
DDBJ/EMBL/GenBank accession nos+
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact email@example.com
The Shigella bacteria cause bacillary dysentery, which remains a significant threat to public health. The genus status and species classification appear no longer valid, as compelling evidence indicates that Shigella, as well as enteroinvasive Escherichia coli, are derived from multiple origins of E.coli and form a single pathovar. Nevertheless, Shigella dysenteriae serotype 1 causes deadly epidemics but Shigella boydii is restricted to the Indian subcontinent, while Shigella flexneri and Shigella sonnei are prevalent in developing and developed countries respectively. To begin to explain these distinctive epidemiological and pathological features at the genome level, we have carried out comparative genomics on four representative strains. Each of the Shigella genomes includes a virulence plasmid that encodes conserved primary virulence determinants. The Shigella chromosomes share most of their genes with that of E.coli K12 strain MG1655, but each has over 200 pseudogenes, 300~700 copies of insertion sequence (IS) elements, and numerous deletions, insertions, translocations and inversions. There is extensive diversity of putative virulence genes, mostly acquired via bacteriophage-mediated lateral gene transfer. Hence, via convergent evolution involving gain and loss of functions, through bacteriophage-mediated gene acquisition, IS-mediated DNA rearrangements and formation of pseudogenes, the Shigella spp. became highly specific human pathogens with variable epidemiological and pathological features.
Shigella is a group of Gram-negative, facultative intracellular pathogens. Recognized as the etiologic agents of bacillary dysentery or shigellosis in the 1890s, Shigella was adopted as a genus in the 1950s and subgrouped into four species: Shigella dysenteriae, Shigella flexneri, Shigella boydii and Shigella sonnei (also designated as serogroups A to D) (1). The bacteria are primarily transmitted through the faecal-oral route and therefore continue to threaten public health mainly in developing countries where sanitation is poor. The estimated annual number of episodes of shigellosis is 160 million, with 1.1 million deaths, mostly children under 5 years old in developing countries (2). Owing to the emerging multiple resistance strains that have compromised antibiotic treatment, development of effective novel vaccination strategies is urgently required (3).
There are very few biochemical properties that can distinguish Shigella from enteroinvasive Escherichia coli (EIEC), which are also a major cause of dysentery. Indeed, some O-antigens associated with EIEC are identical to those found in Shigella spp. (4), and many plasmid-associated virulence determinants are common to both EIEC and Shigella (5).
The virulence plasmid encodes the ~30 kb Mxi-Spa type III secretion system (TTSS) and invasion plasmid antigens (Ipa proteins) required for invasion of the colonic and rectal epithelial cells, and the release of the bacteria into the host cell cytosol. IpaB in particular induces apoptosis in host macrophages and dentritic cells, leading to inflammatory infection (6). The virulence plasmid also encodes a surface protein, IcsA, responsible for actin-based motility required for intra- and inter-cellular spread of the bacteria (7). In addition to the plasmid, many chromosomal genes, such as those encoded by the Shigella pathogenicity island (SHI)-1 and SHI-2, also contribute to virulence (8–10). For a better understanding of the genetic basis of Shigella pathogenicity, we and others previously sequenced the genome of Shigella flexneri 2a, the most prevalent of the Shigella species (11,12). The genome of S.flexneri includes a virulence plasmid and shares a large proportion of chromosomal genes with closely related non-pathogenic and enterohemorrhagic E.coli (EHEC) strains. This is in good agreement with a study based on multilocus sequencing (MLS) that places Shigella within E.coli (13). However, in the S.flexneri chromosome, there are hundreds of pseudogenes, numerous lateral acquired S.flexneri-specific sequences, as well as insertion sequence (IS)-mediated deletions, translocations and inversions, which extensively reshaped the genome presumably for the benefit of a fuller expression of virulence.
Despite the availability of the S.flexneri genome sequence, questions remain about the distinctive epidemiological and pathological features that Shigella species/strains exhibit. While S.flexneri (six serotypes) is primarily endemic in developing countries, S.sonnei (1 serotype) is largely associated with episodes in industrialized nations. S.boydii (18 serotypes) is mainly endemic to the Indian subcontinent. Shigella dysenteriae serotype 1, which possesses the cytotoxic Shiga toxin (Stx), causes deadly epidemics in many of the poorer countries (6). For understanding of these viable features, we have determined the complete nucleotide sequences for the genomes of S.dysenteriae serotype 1 (strain 197), S.boydii serotype 4 (strain 227) and S.sonnei (strain 046) for comparison with the previously reported S.flexneri serotype 2a (strain 301) genome. Our study has revealed extensive diversity among the Shigella genomes, forming the genetic basis to explain the species/strain specific epidemiological and pathological features. Furthermore, many of the putative novel virulence genes identified may offer possible targets for the development of new treatment and prevention strategies.
S.dysenteriae serotype 1 (strain 197), S.boydii serotype 4 (strain 227) and S.sonnei (strain 046) were subjected to complete genome sequencing and are abbreviated to Sd197, Sb227 and Ss046, respectively. All strains were isolated from epidemics in China during the 1950s and were kindly provided by The Institute of Epidemiology and Microbiology, Chinese Academy of Preventive Medicine.
The whole genome sequence shotgun libraries for all strains were established as described previously (11), and ABI3730 automated sequencers were used for sequence collection. For each genome, we generated over 48000 paired-end shotgun reads with estimated 8- to 9-fold coverage. The initial genome assembly was processed by phred/phrap program with the Q20 criteria (14). As there were large numbers of IS-elements present in each of the genomes, to avoid mis-assembly contigs obtained by phrap were split at each dubious IS locus and their relationships were rebuilt manually based on paired-end reads location information using Consed (15). Approximately 4500–6000 sequencing reads were generated for primer-walking of large clones or for PCR amplicons during the finishing phase for each of the genomes. To verify the final assembly, we designed overlapping primer pairs covering the whole genome sequence using genomic DNA as template for PCR amplifications. The genome annotations were performed as described previously (11), and GenomeComp was used for genomic comparison with default parameters (16). Each pairwise comparison figure used in Figure 2 was exported from GenomeComp with a 1000 bp filter setting along with the scale setting of 3000 and 300 for chromosomes and virulence plasmids, respectively. The KEGG database was used for the metabolic pathways analysis (17).
Complete genome sequences have been deposited in the GenBank. The accession numbers for chromosomes and virulence plasmids are Sd197, CP000034 and CP000035; Sb227, CP000036 and CP000037; Ss046, CP000038 and CP000039. In addition, genome annotation and comparative analysis can be obtained at ShiBASE (http://www.mgc.ac.cn/ShiBASE/) (18).
In common with the reported S.flexneri strain Sf301 and 2457T, the genomes of the newly sequenced Shigella strains all contain a virulence plasmid and a single chromosome (Table 1 and Figure 1). Note that, we included data of both Sf301 and 2457T genomes in Tables 1–5 for a complete comparison. However, since variations between the 2457T and the Sf301genomes are minute (12), we use Sf301 genome only for comparison with newly sequenced genomes to avoid redundant descriptions. All the virulence plasmids have nearly identical (R100-like) replication origins and maintenance genes, including repA, copA and copB for replication, and the parA and parB genes for partitioning. The plasmid from S.flexneri is also known to have post-segregation killing systems ccdA/ccdB and mvpA/mvpT (19,20). While the ccdA/ccdB is absent from pSS_046 in Ss046 only, the mvpA/mvpT is intact in all the virulence plasmids. The ~30 kb cell-entry region, encoding the Mxi-Spa TTSS and Ipa proteins, is generally conserved in virulence plasmids pCP301, pSD1_197 and pSS_046. But, there is some polymorphism, with ipaD showing the most with 41 polymorphic sites (4.1% of the coding sequence). However, a pairwise analysis on the ipaD coding sequences showed that only in the case of pCP301 and pSS_046 is the synonymous to non-synonymous substitution ratio (Ks/Ka) smaller than 1 (0.71). The Ks/Ka rations are 1.85 and 1.73 for pCP301 and pSD1_197, and pSS_046 and pSD1_197, respectively. In fact, the Ks/Ka ratios for the majority of genes in the cell-entry regions are greater than 1 (data not shown), suggesting that the selection pressure from the host has maintained the conservation of these coding sequences. This is in contrast to the notion that many plasmid genes that encode secreted effector proteins (e.g. ospB, ospC2, ospD2, ospF and mxiL) outside of the cell-entry region are under pressure for non-synonymous substitution (21). The cell-entry region is bracketed by IS100 and IS600 in all the virulence plasmids, suggesting the transmission of a common ancestral form of the virulence plasmid to all Shigella, or alternatively, the cell-entry region was transmitted to all the virulence plasmids from related sources. The cell-entry region and the icsA gene are, however, deleted from pSB4_227 in strain Sb227. Additionally, pSB4_227 lacks a segment of ~30 kb corresponding to the region between icsA and ccdA in pCP301 (Figure 2b). Since Sb227 was positive in the Sereny test when isolated and ipaB was detected previously by PCR (China CDC record), loss of the cell-entry region and icsA has probably occurred during long-term storage. Thus, caution needs being taken for interpreting gene decay via deletions in the Shigella genomes as the presence of the huge numbers of IS-elements can riddle the genomes considerably during storage.
The Shigella chromosomes have the same replication origin and terminus as those of MG1655 (22), indicating that they probably have the same replication mechanism as E.coli. In all the Shigella genomes, the rRNA operons map to approximately the same relative positions as in MG1655, indicating that there is no DNA recombination between rRNA operons as observed previously in some Shigella strains (23). All currently sequenced Shigella and E.coli genomes available from GenBank have in total ~3 Mb in common (or backbone). This potentially encodes 2790 proteins accounting for 65% of coding capacity of the MG1655 genome, which likely includes genes essential for bacterial survival and growth. Within the backbone there are 2393 orthologous genes shared by all four genomes. Details about orthologous genes between each pair of the genomes can be found in ShiBASE at http://www.mgc.ac.cn/ShiBASE/Orth_order.htm#table. Within the backbone there are 313 genes which are pseudogenes in at least one of the Shigella genomes (Supplementary Table S1), indicating that probably none of these is essential for survival and growth. The chromosomes of Sd197, Sf301 and Sb227 are smaller than MG1655, while that of Ss046 is larger (Table 1). This variation in genome size is due mainly to deletions and insertions of >5 kb.
The most striking feature of the Shigella genomes is their highly dynamic nature due to the presence of hundreds of IS-elements in each of the genomes (Table 2). IS-elements are capable of causing many kinds of DNA rearrangements (24) and the presence of the many rearrangements (deletions as well as translocations and inversions) are a likely the result of the copious numbers of IS-elements. The Sd197 genome shows the most rearrangements and is considerably smaller than the MG1655 genome due to a large number of deletions (Table 1). The genome of this Shigella strain also possesses the greatest number of IS-elements, mainly in the form of IS1N (Table 2), which may be responsible for many of these rearrangements.
The deletions are often associated with translocations and inversions, which interrupt the collinearity of the Shigella and the MG1655 chromosomes (Figure 2a). Sd197 and Sb227 have more translocations and inversions involving DNA segments >5 kb than Sf301 and Ss046 (Table 1). Therefore, the collinearity with MG1655 chromosome is more severely interrupted in Sd197 and Sb227. All Shigella chromosomes have inversions at the replication origins and termini, which have been suggested to be recombination hotspots (25). The rearrangements at the termini can be very complex. Although a single inversion, probably mediated by IS1, is present in Ss046, several inversions and translocations of different sizes, and probably mediated by different IS-elements, are present in the other genomes (Figure 2a). This suggests that the rearrangements at the termini were formed through independent recombination events among the Shigella genomes.
GC skew is the measurement of mononucleotide frequencies ([fG − fC]/[fG + fC]). The GC compositional strand bias observed in the E.coli genome, which gives rise to two distinctive replichores, has been hypothesized to reflect the biased mutational traits in codon positions in the leading and lagging strands under natural selection (22). Owing to the many inversions, the GC skew has been distorted in the Shigella genomes particularly in Sd197 and Sb227 (Figure 1).
Inversions are often accompanied by deletions. The ompT gene is removed by inversion-associated deletions from all four genomes. This is the basis of so called kcp locus necessary for Shigella to cause keratoconjunctivitis in guinea pigs because OmpT reduces IcsA expression (26). The cadA gene responsible for converting lysine to cadaverine that in turn attenuates virulence (27) is missing from Sf301 and Sb227 by inversion-associated deletions. In Sd197 and Ss046 cadA is inactivated via a frameshift and an IS insertion, respectively.
Compared with MG1655, Shigella strains not only have many more copies of IS-elements but also have additional IS-species, such as IS1N, IS600 and IS629 (Table 2). Within the Shigella genomes, IS1 is predominant in the Sf301, Sb227 and Ss046 chromosomes whereas IS1N is copiously present in the Sd197 chromosome. Intact IS21 and IS630 are present only in Ss046, while the newly identified ISSbo6 is found mainly in Sb227 chromosome. ISSbo6 is similar to ISEc8 found adjacent to the locus of enterocyte effacement (LEE) pathogenicity island in EHEC (28). Furthermore, most copies of the ISSbo6 are located within SHI-1, SHI-2 and ipaH islands (see below) in the Sb227 genome. The virulence plasmids and chromosomes share most of the IS-species, suggesting that inter- and intra-replicon translocation and replication has occurred, leading to large numbers of IS-elements in the genomes.
The virulence plasmids also display a dynamic nature with many IS-mediated deletions, translocations and inversions. Plasmid pSS_046 from Ss046 shows the closest collinearity to pCP301 (Figure 2b). Apart from IS-mediated inversions and translocations, the collinearity is interrupted downstream to the replication origin, ori, due to a 13 kb insertion in pSS_046 that carries genes for O-antigen synthesis as described previously (29). pSB4_227 from Sb227 also shows the collinearity with pCP301 except for the ~80 kb deletion including the cell-entry region (see above). Plasmid pSD1_197 from Sd197 has a notable translocation involving a DNA segment of ~50 kb associated with duplication of the ori sequence and nearby repA and copAB genes. As a result, the cell-entry region is sandwiched by two sets of ori, repA and copAB (Figure 2b). The original ori sequence is truncated, so that plasmid replication is probably performed by the functional duplicated ori sequence.
A number of notable loci present in pCP301 are absent from pSS_046. These include sepA, (which codes for an extracellular serine protease involved in tissue invasion) (30), phoN (encoding a non-specific phosphatase), stbAB (encoding one of the two partition systems) and ipgH (encoding a sugar phosphate). The sepA gene has been shown by DNA hybridization to be absent from a number of other S.sonnei virulence plasmids (21).
The distribution of putative virulence genes shows diversity among the Shigella genomes (Tables 3 and and4).4). The so-called Shigella pathogenicity island (SHI)-1 in Sf301 encodes three characterized proteins: the autotransporter proteases Pic and SigA, and the enterotoxin ShET1 which is encoded by the setAB genes and which are entirely within, and on the complementary strand of, the pic coding sequence. Pic is implicated in mucinase activity, serum resistance, and hemagglutination (31), while SigA is capable of casein degradation, is cytopathic for HEp-2 cells and along with ShET1 contributes to fluid accumulation in rabbit ileal loops (32,33). SHI-1 is wholly absent from Sd197, and in Sb227 and Ss046, although sigA is present, the pic/setAB coding sequence is missing. A second copy of sigA is also present in Sb227.
SHI-2 was originally identified at the selC tRNA locus in S.flexneri. It carries the iut/iuc operon encoding an aerobactin system for iron acquisition (10). SHI-2 is present in Ss046 but unlinked with the selC gene, which appears to be caused by an inversion near the replication origin in Ss046, as evidenced by the fact that selC tRNA gene is located in the leading strand in MG1655 and Sf301 but situated in the lagging strand in Ss046. The previously reported S.boydii SHI-3 is present in Sb227. SHI-3 carries the same iut/iuc operon as SHI-2 but is linked with the pheU tRNA locus (34). Sd197 has neither SHI-2 nor SHI-3 but solely possesses the shu and the iro operons (Table 3). The shu operon encodes a TonB-dependent heme transport system (35). The iro genes were originally identified in Salmonella enterica as a ferric iron transport system (36).
One of the interesting features of the Sf301 genome is the presence of 12 copies of the ipaH genes: 5 on the plasmid and 7 on the chromosome, of which 5 are located in ipaH-islands which are apparently acquired via phage-mediated lateral gene transfer (11). All the ipaH products have a conserved C-terminal half of 260 amino acid residues but variable N-terminal halves, within which are leucine-rich repeat regions implicated in protein–protein interaction. The plasmid encoded IpaH7.8 is involved in the escape of Shigella from phagocytic vacuoles in macrophages (37), and the bacteria express more IpaH9.8 when inside host cells (38). Hence, the multiple ipaH are an interesting phenomenon. It is now confirmed that there are multiple ipaH genes in all Shigella genomes (Tables 3 and and4),4), except that ipaH1.4 and ipaH2.5 are absent in pSB4_227 and pSS_046, respectively, and that the coding region of ipaH1.4 in pSD1_197 is truncated due to an IS629 insertion. Thus, the necessity for these two ipaH genes in virulence is in doubt. The chromosomal ipaH genes are mostly within ipaH islands and are always next to a gene that encodes a protein homologous to a bacteriophage P27 protein (accession no. NP_543109) of unknown function. However, the plasmid ipaH genes from all strains are unlinked with the phage gene. Hence, the virulence plasmids and the chromosomes may have acquired the ipaH genes from different sources or by different mechanisms.
The type II secretion system (T2SS) encoded by genes of the general secretion pathway (gsp) is widely distributed in Gram-negative bacteria (Supplementary Figure S1). The known E.coli T2SS encoded by the yhe genes at 74.5 min of the MG1655 chromosome is absent in all four sequenced Shigella genomes (Figure 3a). However, there is a novel set of gsp genes in the Sd197 and Sb227 chromosomes, which is absent in MG1655 and the other Shigella genomes (Figure 3b). The gsp products from Sd197 and Sb227 show some similarity to those of the E.coli yhe genes, but significantly greater similarity to those of the gsp genes from enterotoxigenic E.coli (ETEC) and Vibrio cholerae responsible for secreting the E.coli heat labile toxin (Ltx) and cholera toxin (Ctx), respectively (39,40) (Supplementary Figure S1). Shiga toxin, Stx, is encoded by a prophage at a distance locus from gsp as described previously (41). The Sb227 T2SS is likely to be inactive due to a frameshift in gspC and a nonsense mutation in gspD. Interestingly, the gsp genes are present as an island at the pheV tRNA locus in both Sd197 and Sb227. However, in Sf301 and Ss046, this locus is occupied by SHI-1 (Figure 3b). So, the pheV tRNA locus in Shigella strains may be a hotspot for insertion of laterally transferred genes.
In the Shigella genomes, there are regions similar to the ‘O-islands’ (OI) from EHEC EDL933 (42). Some of these O-island-like sequences may be of significance to virulence. In Sd197, the open reading frames (ORFs) SDY0416-SDY0425 may encode a RTX-toxin-like exoprotein and a transporter similar to those encoded by OI #28. In Sd197 and Sf301, there are ORFs (SDY1240-SDY1242 and SF1192-SF1194, respectively) which encode a putative iron compound ABC transporter similar to those from EDL933 (Z1964-Z1966). In Sf301, Sb227 and Ss046, there are genes encoding a putative adhesin similar to that encoded by the EDL933 OI #144 (Z5029), which belong to the Yersinia YadA family that mediates bacterial adherence and invasion through binding to fibronectin and β1 integrin (43) and induces the production of interleukin-8 (44). However, only Ss046 encodes a protein of 1616 amino acids similar in length to that of EDL933 of 1588 amino acids. Sf301 encodes a protein with only 990 amino acids at the C-terminus, whereas Sb227 encodes a protein with a truncation of more than 200 amino acids at the C-terminus, both of which may not be functional.
Deletions and pseudogenes are effective mechanisms for loss of functions, and the inactivation of the ompT and cadA genes provide examples of how loss of some functions may increase the pathogenicity of Shigella (see above). Additionally, all the sequenced strains of Shigella have lost flagellar function due to mutations in many different genes (Supplementary Table S2). Fimbriae are also absent from these Shigella. In the EHEC genome (NC_002655), there are 14 loci involved in fimbrial biogenesis. None of the counterpart loci in the Shigella genomes is intact (Supplementary Table S2).
The central intermediary metabolism is conserved in all four Shigella species and MG1655. However, considerable variations have been found in carbohydrate and amino acid metabolism (Supplementary Table S2). Shigella bacteria do not synthesize lysine decarboxylase due to inactivation of cadA gene (see above), do not produce hydrogen sulfide from thiosulfate, do not produce gas from carbohydrate, do not use citric acid as a sole carbon source and do not grow on sodium acetate. Table 5 is a summary of the genetic basis for these negative Shigella properties in main clinical biochemical reactions (see Supplementary Table S2 for a complete list). Note that only Sb227 carries all genes necessary for utilization of d-mannitol, d-sorbitol and d-xylose (Table 5).
Lactose fermentation is a biochemical property commonly used for distinguishing Shigella from E.coli. However, some S.dysenteriae 1 and S.sonnei isolates ferment lactose slowly, which now can be explained genetically. In the genomes of Sd197 and Ss046 the key gene, lacZ (encoding β-d-galactosidase), is intact though lacY (encoding galactose permease) is a pseudogene (both of them are deleted from Sf301 and Sb227). Additionally, Sd197 and Ss046 have ORFs SDY2556 and SSO2450, respectively, which encode proteins similar to the sucrose permease from EHEC (NP_288931) sharing a conserved LacY domain and overall 34% identity with the lactose permease from Klebsiella pneumoniae (JT0487). This unspecialised galactoside transport function may compensate partially for the loss of LacY in Sd197 and Ss046 leading to slow lactose fermentation.
An MLS study (13) and the previous reported S.flexneri genomes (11,12) have suggested strongly that Shigella is within the species of E.coli. The complete genomes of S.dysenteriae, S.boydii and S.sonnei have provided additional supporting evidence; they all have ~3 Mb of genomic DNA in common with all published E.coli and Shigella genomes. The extensive diversity of the Shigella genomes revealed by the whole genome sequences supports the hypothesis that Shigella have emerged from diverse origins of E.coli. Recently, Lan et al. (45) have presented evidence based on MLS that EIEC strains are also derived from different origins of E.coli. One of the EIEC strains (serotype O112ac) is grouped into Shigella Cluster 2, and outliers of Shigella strains, S.dysenteriae type 1 and S.sonnei, are more closely related to EIEC strains. Based on the comparative genomic hybridization microarray, Fukiya et al. (46) have also shown that three out of four EIEC strains are closely related to three Shigella strains (S.flexneri 2a, S.sonnei and S.boydii) but more distance to EPEC, ETEC, EHEC and UPEC strains. Thus, there is little doubt now that Shigella and EIEC form a single pathovar of E.coli.
The extensive diversity of the Shigella genomes appears to be multi-factorial. First, Shigella has evolved from diverse genomic backgrounds of E.coli. Particularly, we must remember that Sd197 and Ss046 are outside of three main Shigella phylogenetic groups (13). Therefore, the diversity of the four sequenced genomes does not reflect the entire genome diversity within the Shigella. Second, putative virulence genes have been transferred by bacteriophages to selected genomes. The distribution of SHI-1 and SHI-2 provide an example of this. Third, convergent evolution has been facilitated by IS-mediated rearrangements. For example, ompT and cadA were inactivated by deletions involving different DNA segments in different genomes, and 14 orthologous fimbrial systems identified in the EHEC genome have all been inactivated in different ways in the Shigella genomes. Fourth, creation of independent pseudogenes, e.g. different genes for utilization of d-sorbitol are inactivated in different genomes (Table 5).
The diversity of the genomes provides a basis for further investigations into pathogenesis, epidemiology and microbial evolution. For example, it is known that S.dysenteriae produces Shiga toxin (47). However, it is unknown until now that there is a T2SS. Given that Stx has an overall similar structure to Ctx and Ltx and that T2SS from Sd197 shows extensive homology to those from V.cholerae and ETEC, it is highly likely that Stx is actively secreted from Sd197. Therefore, the S.dysenteriae T2SS ought to contribute significantly to pathogenicity as it enables toxin molecules to reach the target host cells from proliferating bacteria. Otherwise, the accumulated toxin molecules can be released only upon bacterial lysis. Of course, it is also of interest to investigate whether the Shigella T2SS secretes other putative virulence factors in addition to Shiga toxin.
A comparative genomic hybridization test indicates a wide distribution of the gsp genes among strains from all phylogenetic groups (J. Peng, X. Zhang, J. Yang, J. Wang, E. Yang, W. Bin, C. Wei, M. Sun and Q. Jin, unpublished data), which suggests that many strains possessed Stx before their subsequent loss, and thereafter in the case of Sb227 the T2SS was inactivated. It has been reported that an Stx-expressing prophage from an S.sonnei strain is able to form plaques on a number of different Shigella species and serotypes (48). We found in this study that Ss046 possesses remnants of the previously reported Stx-phage Φ P27 (49) which has a different gene content and organization to the Sd197 Stx-phage, and Sf301 and Sb227 possess remnants of the Stx-phage of Sd197. Taken together, many Shigella strains have probably gained and then lost the Stx genes in the evolutionally past. Perhaps, loss of Stx genes has provided advantages to the bacteria for better adaptation to the human hosts, as causing severer disease offers little benefit to the organisms for long term survival. Alternatively, according to the hypothesis by Escobar-Paramo et al. (50) the integration, retention and expression of certain virulence factors may be the result of the interaction between the newly introduced genes and the bacterial genomic background. Hence, perhaps only S.dysenteriae 1 and a few S.sonnei strains have the right genomic background to retain and express Stx stably.
In addition to Shiga toxin and the T2SS, S.dysenteriae type 1 alone possesses two iron acquisition systems, shu and iro. Though the shu system, responsible for heme uptake, is not essential for invasion and proliferation in cultured Henle cells, it still can be very important in vivo (51). Recently, Skaar et al. (52) showed that Staphylococcus aureus preferably imports heme iron over transferrin in vivo and that mutant strains defective in heme transport are severely attenuated in a Caenorhabditis elegans infection model. Thus, there is a need to establish whether S.dysenteriae also prefers heme over other iron sources during infection. In addition, it is important to identify the genetic basis of the previously observed heme transport activity in S.flexneri (53). A comparison of that yet unidentified heme transport system with the Shu system is necessary for a better understanding of the iron acquisition strategies that Shigella employs.
The iro genes were originally identified in S.enterica as a ferric iron transport system (36). As the receptor, the iroN product, has affinity to some iron-containing substrates produced by soil microbes, the iro system has been speculated to facilitate the growth of S.enterica in soil. Whether this system offers an advantage to S.dysenteriae for environmental survival over other strains or plays a role during infection requires further investigations. We must emphasize that the observed differences in gene content between the different species here are not necessarily characteristic for the different species. For example, the iro genes are only present in serotype 1 but not in other S.dysenteriae strains (J. Peng, X. Zhang, J. Yang, J. Wang, E. Yang, W. Bin, C. Wei, M. Sun and Q. Jin, unpublished data).
In general, the virulence of Shigella is in the order, S.dysenteriae > S.flexneri > S.sonnei (54). The lack of ShET1 and Pic, and SepA in Ss046 may collectively make S.sonnei lesser virulent than S.flexneri, as these are probably the major determinants involved in the diarrhoeal phase of the infection. On the other hand, S.dysenteriae 1 infection generally has only a very limited diarrhoeal phase, but abrupt onset of acute dysentery (54). This may be due to its possession of factors, such as the very potent Shiga toxin, and thus lack of SHI-1 is insignificant to S.dysenteriae type 1.
The presence of large numbers of IS-elements in the Shigella genomes is likely the major cause of many of the genome rearrangements. IS1 dominates in Sf301, Ss046 and Sb227, and is associated with DNA rearrangements at many different loci in these three genomes. These events could not be random events which must have occurred in accordance with the whole genome property or promoted (or restrained) by other genetic loci. Previously ISIN, also known as iso-IS1, was found only in S.dysenteriae serotype 1 (55). However, we found that Sb227 and Ss046, and the EHEC strain EDL933 all have a single copy of ISIN next to yiiX, a gene encoding a hypothetical protein. The uropathogenic E.coli (UPEC) strain CFT073 has three partial copies but none is at this locus. Since Sd197 also has a copy of ISIN next to yiiX, this locus is likely to be the original site for ISIN acquisition except the UPEC strain. Thus, either the expansion of ISIN is permitted only in S.dysenteriae type 1 or ISIN has transmitted into the other genomes fairly recently.
The fact that ISSbo6 is restricted in SHI-1, SHI-2 and ipaH islands in Sb227 is interesting. It indicates that those pathogenicity islands were acquired earlier than the IS-elements, which probably is mainly distributed among microbes within the Indian subcontinent.
Inversions, probably IS-mediated, are another mechanism that has reshaped the genomes. One of the conserved genetic traits, namely the CG bias strand composition or GC-skew, among the enteric bacteria, is distorted by inversions in the Shigella genomes (Figure 1). This can be significant to gene expression as GC-skew is a reflection of the biased mutational traits in codon positions in the leading and lagging strands under natural selection (22). Shigella colonizes and proliferates in the cell cytosol, a niche unique amongst the enteric bacteria, and thus is likely to be under different selective pressure to change the expression of many genes compared with other enteric bacteria. Inversions and, additionally, translocations effectively lead to preferred leading or lagging strand, orientation and distance to the replication origin for the optimal expression of these genes.
Gene decay or reductive evolution is noted to be an important evolutionary mechanism for the obligate intracellular pathogens, such as Mycobacterium leprae (56). Shigella bacteria, being facultative intracellular pathogens, also employ such a mechanism. The Sd197 genome is obviously smaller than that of MG1655 (Table 1), and the other three sequenced genomes display a net loss of genetic material [excluding the IS sequences which account for 7–12% of the genomes (Table 1)]. Besides deletions, formation of pseudogenes also plays an important part in gene decay, leading to many important characteristics in favour of Shigella pathogenesis.
For many pathogenic bacteria, flagella are responsible for chemotaxis and play a role in tissue invasion (57). Conversely, mammalian hosts detect the conserved domain on flagellin monomers through the Toll-like receptor (TLR)-5, which triggers pro-inflammatory and adaptive immune responses (57). Shigella spends most of the time intracellularly during infection and is very mobile within the cells by polymerising actin using IcsA. Therefore, flagella synthesis is inactivated in all genomes via deletions as well as pseudogenes (Supplementary Table S2). This not only conserves energy but also allows evasion of TLR-5 mediated innate and adaptive immunity.
Adherence to the host cell surface via fimbriae is generally assumed to be important for bacteria to establish an infection. However, Shigella infective doses can be very low despite all the fimbrial genes orthologous to those of EHEC being inactivated. The YadA-like proteins are the only Shigella adhesins identified so far and are intact only in Ss046. Perhaps, the efficient invasion mechanism through Ipa proteins and the TTSS has overcome the need for fimbriae and other adherence factors, which has led to the inactivation of the fimbrial genes as well as the yadA genes. While it is important to investigate the significance of the intact YadA in Ss046 for virulence, sequencing or performing northern blotting on more epidemiologic S.sonnei strains will indicate whether or not the yadA gene is prone to gene decay.
In summary, the Shigellas have evolved from different strains of E.coli and have become highly specific human pathogens through extensive convergent evolution involving gain and loss of functions. A similar scenario is also observed in Typhi and Paratyphi A of S.enterica; through convergent evolution, mainly involving gene decay, these pathogens have become human restricted, with similar virulence properties (58). This study has provided valuable information for further investigation of the pathogenicity, epidemiology and virulence of one of the important human pathogens, and provides some insight into how these pathogens have evolved.
Supplementary Data are available at NAR online.
The authors thank P. J. Sansonetti and K. Turner for critical reading of the manuscript. This work was funded by the National Basic Research Priorities Program (grant no. 2005CB522904) and the High Technology Research and Development Program (grant no. 2001AA223011) from the Ministry of Science and Technology of China. Funding to pay the Open Access publication charges for this article was provided by MSTC.
Conflict of interest statement. None declared.