|Home | About | Journals | Submit | Contact Us | Français|
Enteroinvasive Escherichia coli (EIEC), a distinctive pathogenic form of E. coli causing dysentery, is similar in many properties to bacteria placed in the four species of Shigella. Shigella has been separated as a genus but in fact comprises several clones of E. coli. The evolutionary relationships of 32 EIEC strains of 12 serotypes have been determined by sequencing of four housekeeping genes and two plasmid genes which were used previously to determine the relationships of Shigella strains. The EIEC strains were grouped in four clusters with one outlier strain, indicating independent derivation of EIEC several times. Three of the four clusters contain more than one O antigen type. One EIEC strain (an O112ac:H− strain) was found in Shigella cluster 3 but is not identical to the Shigella cluster 3 D2 and B15 strains with the same O antigen. Two forms of the virulence plasmid pINV have been identified in Shigella strains by using the sequences of ipgD and mxiA genes, and all but two of our EIEC strains have pINV A. The EIEC strains were grouped in two subclusters with a very low level of variation, generally not intermingled with Shigella pINV A strains. The EIEC clusters based on housekeeping genes were reflected in the plasmid gene sequences, with some exceptions. Two strains were found in the pINV B form by using the ipgD sequence, with one strain having an mxiA sequence similar to the divergent sequence of D1. Clearly, EIEC and Shigella spp. form a pathovar of E. coli.
Enteroinvasive Escherichia coli (EIEC) is a pathogenic form of E. coli that can cause dysentery (25). Making a distinction between EIEC and Shigella spp. has been known for a long time to be difficult and depends on a very limited number of characteristics. Shigella spp. have been shown unequivocally to be clones of E. coli by sequencing of housekeeping genes (references 32 and 33 and references therein).
Historically, EIEC was first described in 1944, when it was called paracolon bacillus, but it was later identified as E. coli O124. In the 1950s, another group of E. coli strains was found to cause experimental keratoconjunctivitis in guinea pigs by the Serény test—a trait common with Shigella. These strains were initially classified under Shigella as Shigella manolovi, S. sofia, Shigella strain 13, and S. metadysenteriae and were later placed in the E. coli subgroup EIEC as E. coli O164 (2, 23, 35). EIEC and Shigella spp. bear remarkable phenotypic likeness, with a reduction in the number of substrates utilized relative to commensal E. coli strains. These similar phenotypes may be attributed to the fact that these organisms spend much of their lifetime within eukaryotic cells and have a different nutrient supply from most E. coli strains (19). Most EIEC strains are Lac−, nonmotile and lysine decarboxylase negative (11, 38). The few biochemical properties differentiating E. coli and Shigella spp. include mucate and acetate. EIEC may be positive for one or both of the properties, while, with rare exceptions, Shigella strains are negative for both and more than 90% of typical E. coli strains are positive for both (4, 7).
A limited set of O antigens are found in EIEC strains, including O28ac, O29, O112ac, O121, O124, O135, O136, O143, O144, O152, O159, O164, O167, and O173 (5, 10, 14, 15, 21, 25,30). Three of these EIEC-associated O antigens are identical to O antigens present in Shigella spp. (5), namely, O112ac, O124, and O152, with Shigella O antigens of S. boydii serotype 15/S. dysenteriae serotype 2, S. boydii serotype 3, and S. dysenteriae serotype 12, respectively. EIEC strains with those shared O antigens show a higher metabolic activity than the Shigella strains (10). Nonetheless, differentiation between EIEC and Shigella organisms with the same serotype often proves very difficult due to their almost identical biochemical and physiological traits. EIEC strains can be distinguished from other E. coli strains by testing their invasion capacity by the Serény test or by identification of bacterial invasion-associated proteins or genes via specific tests. However, these methods are not generally used for routine diagnosis, and EIEC strains are only provisionally identified by O serotyping with commercially available antisera in routine diagnostic laboratories (3).
EIEC and Shigella strains harbor a common 220-kb plasmid, collectively termed pINV, although specific names were given for some pINV plasmids, for example, pWR100 of S. flexneri 5, pMYSH6000 of S. flexneri 2a, and pSS120 of S. sonnei. In our prevous analysis of Shigella strains (18), two major forms of the virulence plasmid with consistent sequence differences were revealed and referred to as pINV A and pINV B. The distribution of the two forms correlated well with the variation in chromosomal genes found in our earlier study, in which the majority of the Shigella strains were grouped into three clusters with a few outliers (33). All of the cluster 1 strains, consisting of the majority of S. dysenteriae and S. boydii serotypes plus S. flexneri serotype 6, contain the pINV A form, and cluster 3 strains, consisting of S. flexneri serotypes 1 to 5 and S. boydii serotype 12, have pINV B. Of the cluster 2 strains (seven S. boydii serotypes and S. dysenteriae serotype 2), pINV sequences could be obtained from only three strains, of which two contain the pINV A form and one contains the pINV B form. Of the outliers not in the three clusters, all contain either the pINV A or the pINV B form, with the exception of S. dysenteriae serotype 1, which contains a plasmid with a divergent sequence not corresponding to these two forms, and the S. boydii serotype 13 strain, which does not contain a plasmid.
The two EIEC strains included in the above study contain pINV A. In this study, we examine 32 EIEC strains to determine their relationships to Shigella strains by using both chromosomal and plasmid genes.
A total of 32 EIEC strains were used, and details are given in Table Table1.1. Primers for PCR and sequencing are listed in Table Table2.2. Primer sets for amplification of thrB and purN were designed by Pupo et al. (33), and those for amplification of ipgD and mxiA were designed by Lan et al. (18). Primer sets for the mdh-argR region and trpB were designed specifically for this study. The mdh-argR gene region has low-level variation, and we selected the most variable segment, including the intergenic region between mdh and argR, for amplification and sequencing.
The sequence was obtained directly from the PCR product. Double-stranded PCR product was purified with the Wizard PCR purification system (Promega, Madison, Wis.) to remove excess PCR primer and eluted in 30 μl of sterile distilled water, and the sequence was determined by the dye-terminator technique using a thermal cycler (Perkin-Elmer Cetus, Norwalk, Conn.) and automated 377 DNA sequencer (Applied Biosystems, Burwood, Victoria, Australia), as specified by the manufacturers.
DNA sequences were assembled and edited using PHRED, PHRAP, and CONSED software (13). Further analysis was undertaken using software available from the Australian National Genomic Information Service at The University of Sydney. Sequence comparisons were analyzed using the MULTICOMP package (34). MULTICOMP calculates nucleotide diversity (π) by the method described by Nei and Miller (28) and average pairwise percentage difference. Molecular evolutionary relationships among each of the genes studied were examined by the neighbor-joining method of tree construction (27, 36), based on distances estimated using the two-parameter method of Kimura (17). Phylogenetic trees and bootstrap analysis to determine the statistical stability of each node were done using PHYLIP (version 3.5; Joseph Felsenstein, Department of Genetics, University of Washington, Seattle, Wash.).
Utilization of acetate and mucate fermentation was tested using the methods described by Ewing (10).
The GenBank accession numbers for the nucleotide sequences determined in this study are AY627050 to AY627276.
A total of 3,390 bp of sequence data was analyzed for each strain, comprising 1,032 bp of thrB, 898 bp of purN, 526 bp of trpB, and 934 bp of mdh-argR. The four regions are within regions sequenced previously for Shigella strains, comprising 2,032 bp of the thrB-thrC region, 2,101 bp of the purM-purN region, 1,486 bp of the trpB-trpC region, and 1,541 bp of the mdh-argR region, respectively (Table (Table2).2). The sequence alignment for informative sites (sites that affect the tree topology because they have at least two different bases present in at least two strains each) is shown in Fig. Fig.11 (see supplementary Fig. Fig.11 for informative sites of Shigella and EIEC strains combined at http://www.mmb.usyd.edu.au/archives/). It is very clear that most EIEC strains fall into one of four groups and that the same strains are in each group for all four regions of the chromosome, although the purN and mdh-argR regions have few informative sites for separation of the groups. One strain, M2339, appears to be a recombinant, and another, M2332, fits within cluster 2 Shigella strains.
A tree for the combined data is shown in Fig. Fig.2.2. Published data for the 46 Shigella and 8 E. coli reference (ECOR) E. coli strains (33) were included in the analysis. We also sequenced an additional 12 ECOR set strains to increase their representation. The three Shigella clusters (clusters 1, 2, and 3) identified by the previous study were retained. All but one (M2339) of the EIEC strains are in clusters with at least three strains, named clusters 4, 5, 6, and 7. Cluster 7 consists of strains M2329, M2334, and M2342, with the same O antigen, while the other clusters contain two or more O-antigen types. Bootstrap analysis using 1,000 replicates gave strong support to each cluster.
The pINV genes ipgD and mxiA were sequenced for 32 strains. Both genes have been used for the study of Shigella pINV (18). The ipgD gene is 1,617 bp long, is located at the beginning of the mxiA operon, and encodes a product that is involved in entry of the host cell on bacterial invasion (29), while the 2,006-bp mxiA gene encodes an essential component of the type III secretory machinery (31). Only a partial sequence of ipgD for M2357 was obtained, and ipgD of M2334 failed to be amplified by PCR. Two strains (M519 and M520) sequenced previously (18) were included.
A 1,630-bp region containing ipgD (1,617 bp) was analyzed. A comparison of the EIEC sequences obtained revealed an average pairwise difference of 0.51% and indicated the presence of 44 polymorphic sites, of which 43 are informative and are summarized in Fig. Fig.3.3. M2339 and M2355 are very similar, with a highly divergent sequence which is similar to the pINV B form.
An 1,866-bp stretch of the mxiA sequence was analyzed in each strain. Comparison of the sequences obtained indicated an average pairwise difference of 0.1%. Analysis of the mxiA alleles within all EIEC strains showed that the region contained 26 polymorphic bases but only 2 informative bases (Fig. (Fig.3).3). Of the singularly polymorphic bases, 22 are in M2339, which has the same highly divergent sequence as D1. However, M2339 has an ipgD sequence similar to those of the pINV B form. M2355 is identical to M2339 in its ipgD gene, while the mxiA gene is similar to those of the pINV A form.
The data obtained in this study were combined with previous results (18) to generate a phylogenetic tree for all Shigella and EIEC strains. The combined tree for plasmid genes ipgD and mxiA is shown in Fig. Fig.4.4. Only strains for which sequence data for at least ipgD were obtained are included. The EIEC strain with the mxiA sequence only, M2334, is excluded, since there is very little variation in the pINV A mxiA gene. It is evident that all but two EIEC strains have the pINV A form. The combined tree indicates the presence of two distinct groups within the A form: A1 and A2. However, the bootstrap value at the node of separation of these two subclusters within the A form is less than 50%. Nevertheless there are six sites in ipgD which support the division if only EIEC strains are considered (Fig. (Fig.3)3) and three sites for all strains (see supplementary Fig. Fig.22 at http://www.mmb.usyd.edu.au/archives/). The majority of the cluster 4 and 5 strains are in the pINVA1 subcluster, with identical sequences which are slightly divergent from those of the Shigella strains. However, four and two strains of clusters 4 and 5, respectively, are in pINV subcluster A2. Four cluster 6 strains and two cluster 7 strains have identical pINV sequences. M2339 and M2355 ipgD sequences grouped together and appear to be an ancestral form of pINV B. However, M2339 grouped with D1 if only the mxiA sequence was used.
With the exception of one strain, which falls into Shigella cluster 2, the EIEC strains were found in four separate clusters of at least three strains each and one outlier strain, indicating that the EIEC phenotype has arisen several times. In our previous study, the Shigella phenotype was also shown to have originated separately several times during its evolution (33). The EIEC clusters had fewer O-antigen forms than the Shigella clusters did. While Shigella strains appear to have diverged more phenotypically from typical E. coli than EIEC strains have, none of the clusters defined contain both Shigella and EIEC, except for EIEC strain M2332, which could be a misclassification, as discussed below. EIEC clusters 4 and 5 could have a common origin since no ECOR set strains separate the two clusters. This may also be true for Shigella clusters 2 and 3. Since in each case the cluster branch lengths are long and the bootstrap value is high, the two clusters must have been separated for a long time even if they were derived from a common pINV-carrying strain.
The diversification within Shigella clusters 1 and 2 is estimated to have originated 50,000 to 270,000 years ago, while cluster 3 apparently diversified later, at 35,000 to 170,000 years ago (33). On comparison of EIEC and Shigella clusters, it is clear that the EIEC clusters have diverged less than Shigella clusters. Assuming that the housekeeping genes are evolving at approximately equal rates within the two forms, we may hypothesize that the EIEC strains have had less time to diverge within their clusters and hence may have arisen at a later date than the Shigella clusters.
Two Shigella outlier strains (Shigella strain D8 and S. sonnei) were placed very close to EIEC strains but are possibly derived independently. M2339 is placed very close to Shigella strain D8, but they are separated by ECOR strains and M2339 is more similar to the ECOR strains than to D8 (Fig. (Fig.11 and supplementary Fig. Fig.11 [http://www.mmb.usyd.edu.au/archives/]). Further, the plasmid gene sequences are very different. The ipgD gene of M2339 is similar to those of the B form, and mxiA is similar to that of D1. S. sonnei is close to cluster 6, with identical informative sites in thrB, and shares five characteristic informative sites of cluster 6 but is dissimilar in the other two gene regions. For the plasmid sequences, S. sonnei has a B-form plasmid while cluster 6 strains have the A form.
As evidenced in Fig. Fig.2,2, the EIEC strains analyzed clustered separately from Shigella, with only one exception: M2332 grouped with Shigella cluster 2 strains in every gene studied. This, coupled with similar plasmid gene sequences and identical O antigen (O112ac) to the B15 and D2 O antigen, leads to the conclusion that M2332 is probably a Shigella strain that has been misclassified.
Our sample of strains contains 10 of the 14 O antigens reported to be associated with EIEC (5, 10, 14, 15, 21, 25, 30). The majority of the 10 O antigens sampled were grouped within a single cluster. Of the four EIEC clusters, three have more than one O antigen, with O28, O29, O124, O136, and O164 in cluster 4, O124, O135, O152, and O164 in cluster 5, and O143 and O167 in cluster 6. Cluster 7 has only O144. It is interesting that two O antigens are in two or more clusters, with O124 in clusters 4 and 5 and one outlier strain and O164 in clusters 4 and 5. Two of them are also found in Shigella strains (see below).
This study confirms the trend of rapid expansion of O antigens in the development of invasive clones of E. coli as observed in Shigella clones (19, 33). There are 46 O antigens described for Shigella strains. However, D2 and B15 are the same, and most S. flexneri forms are minor variants of one basic form; hence, there are 33 unique O-antigen forms (19), of which 12 are identical to known E. coli O antigens while 21 are unique to Shigella clones (33). The total number of novel O antigens in Shigella represents 11% of the known E. coli/Shigella O-antigen forms, which is disproportionatelly high considering that Shigella represents only a small fraction of the E. coli diversity. Of the 33 O antigens, 18 were identified in Shigella clusters 1 and 2. The EIEC clusters have far fewer O antigens. This may be because the EIEC clusters arose later than Shigella clusters, based on the level of sequence variation within clusters, and have had a shorter time for expansion of O antigens than Shigella has. However, it may be that EIEC strains have received less attention than Shigella strains and that less common forms have not been reported.
There are a number of Shigella O antigens common within EIEC, including O124 (D3), O112 (D2/B15), O143 (B8), O152 (D12), and O167 (B3). The only O112 strain included in this study seems to be a Shigella strain, as discussed above. This disproportionate representation of certain O-antigen forms in Shigella and EIEC strains supports the notion that O-antigen specificity may be important in pathogenicity (12, 24, 39).
The clustering of Shigella clusters 1 and 3, as observed previously (18), and EIEC cluster 6 strains is consistent between chromosomal genes. However, there are several inconsistencies in the clustering of strains in other clusters, indicating lateral transfer of the pINV plasmid between clusters. Seven cluster 4 strains have pINV subcluster A1 plasmid, while four other cluster 4 strains have pINV subcluster A2 plasmid. Both groups are represented by three O-antigen forms. It is not known whether the primordial pINV form of cluster 4 is A1 or A2. Cluster 5 strains have either pINV subcluster A1 (seven strains) or A2 (two strains) or pINV B (one strain), suggesting that the last two cases are due to transfer of the virulence plasmid or recombination involving the two plasmid genes studied. The cluster 5 strain M2356 is likely to have a recombinant ipgD gene with two informative sites characteristic of pINV subcluster A1 (Fig. (Fig.3).3). Transfer of pINV is also seen in Shigella cluster 2, with two strains having the pINV A form and one having the pINV B form (18).
The plasmid forms found within Shigella tend to group away from those within EIEC, with a longer branch length in comparison to those of EIEC strains, which are tightly clustered, suggesting that Shigella has diverged to a greater extent within its clusters than EIEC has. This higher level of divergence is also reflected in the chromosomal genes, as discussed above.
The long-standing demarcation of Shigella and EIEC is based in part on perceived differences in virulence. EIEC strains are generally said to be less virulent than Shigella strains, with a higher infectious dose of EIEC needed in volunteers (8, 9). However, the Shigella strains used in these studies were generally D1, S. sonnei, and S. flexneri strains, which are known to be the more virulent Shigella forms and more prevalent in epidemiological terms. We have now studied a large sample of EIEC strains and shown that the virulence plasmids of the EIEC strains are the pINV A form, which is very closely related to other pINV A plasmids of Shigella strains. There does not seem to be a case for separation of Shigella and EIEC as classes on the basis of the virulence plasmid. However, differences in pINV may play a role in virulence. We recently compared the pINV A plasmid from S. flexneri serotype 6 and the pINV B plasmid from S. flexneri serotype 2a and found that the majority of the virulence-associated genes are under strong selection pressure for change (20). There are also pINV genes that are variably present in Shigella and EIEC (20) strains, including sepA, encoding serine protease, the major secreted protein of S. flexneri 2a, and ospD3 (senA), encoding an enterotoxin, which is found in only 75% of EIEC strains and 83% of Shigella strains (26). Variation in virulence within and between EIEC and Shigella strains should be further studied based on the phylogenetic relationships of the strains and the plasmids. This may help us to better understand the pathogenesis and epidemiology of Shigella and EIEC infections.
EIEC strains are very similar to Shigella strains in biochemical properties but generally do not fit the full definition for the genus Shigella, since some are motile or lactose fermenting and are seen essentially as traditional E. coli strains. However, if one examines the profile of inactive E. coli (including EIEC) as described by Farmer et al. (11), EIEC is more similar to Shigella than to typical E. coli and some EIEC strains have essentially all the properties of Shigella strains. EIEC can be differentiated from Shigella only by a very limited number of tests including l-serine, d-xylose, and/or sodium acetate utilization and mucate fermentation (7). EIEC isolates may be positive for one or more of the tests, but Shigella strains are generally negative (7). With 12 independent derivations of EIEC and Shigella within E. coli (five for EIEC and seven for Shigella), it is remarkable that biochemical properties, although few, were found to distinguish Shigella strains from EIEC strains. Variation within EIEC strains and between EIEC and Shigella strains in phenotypic properties is now best reassessed based on their natural groupings. We tested all 32 EIEC strains for acetate utilization and mucate fermentation and found that 10 and 15 strains were negative for these two properties, respectively, and 4 strains were negative for both. The distribution of these properties is consistent with the phylogenetic clustering in two cases. All cluster 4 strains are mucate negative, while none of the cluster 6 strains is negative for either property (Fig. (Fig.2).2). The loss of acetate utilization seems to be sporadic, showing independent loss even within a cluster.
Shigella and EIEC strains have many common characteristics, including the lack of several catabolic pathways widely present in E. coli, some of which are known to have resulted from independent loss of chromosomal properties. The loss of catabolic functions is presumably related to the life-style of Shigella and EIEC strains, because some properties are redundant since these strains spend much of their time within eukaryotic cells (19, 33), although at least one property, lysine decarboxylation, is also very deleterious for Shigella and EIEC, with strong selection against its presence (6, 22). As expected, sequencing of two Shigella genomes from S. flexneri serotype 2a strains 301 (16) and 2457T (40) has revealed many pseudogenes relative to other E. coli genomes, with 254 and 372 in strains 301 and 2457T, respectively, although some of the differences are reported to be due to annotation criteria (40). There are unique sets of pseudogenes in each strain, although they are the same serotype. Variation of phenotype properties in EIEC and Shigella strains is likely to be much larger than that seen from commonly used biochemical tests.
EIEC and Shigella strains clearly form a distinctive Shigella-EIEC pathovar, with most EIEC and Shigella strains being found in distinct clusters. Acquisition of the pINV plasmid is a crucial step in the development of invasive forms of E. coli, followed by convergent evolution of properties such as the loss of specific catabolic pathways and motility and expansion of O-antigen diversity. There are many lineages of invasive E. coli. It appears that the EIEC lineages have been derived more recently than the Shigella strains. Resolution of the relationships provides a base for further studies of the rate of gene decay and virulence variation in EIEC and Shigella.
All but two EIEC strains have pINV A, which seems to be more frequently transferred to other E. coli. It is known from tests on several S. flexneri strains that pINV is unable to initiate conjugation (37). Our recent comparison of the F6 pINV A plasmid with the F5 pINV B form indicated that, from the substitution patterns in the incomplete tra region, the remaining tra genes must be functional, at least in some strains. No complete sequence for pINV A is available to determine whether it has retained more transfer-related genes to assist the process. It should also be noted that genes present in other plasmids or on the chromosome may complement the transfer functions present on the pINV plasmid.
Why has EIEC retained some E. coli properties that have been lost in multiple lineages of Shigella? It is likely that EIEC strains are in an intermediate stage and are a potential precursor of “full-blown” Shigella strains, as judged by the lower level of variation within EIEC clusters than within Shigella clusters. Motile strains were present within clusters 4 (O136:H9) and 5 (O124:H30), but the majority have lost the property or have it at retained reduced level (1). The selection pressure for loss of motility must be low in comparison to the gain of new O antigens, which has occurred several times in each cluster. It is also possible that EIEC is a distinctive form, perhaps differing from Shigella by being able to live in both commensal and epithelial mucosa niches. This will depend on the selection pressures involved. There might be a point of no return when EIEC strains lose sufficient commensal E. coli properties.
The elucidation of relationships of both housekeeping and plasmid genes of EIEC and Shigella warrants an end of the demarcation of the two forms, which should be regarded as a single pathovar of E. coli.
This research is supported by grants from the National Health and Medical Research Council of Australia to P.R.R. and R.L. and by the National Council of Research of Brazil (CNPq) and the State of Sāo Paulo Research Foundation (FAPESP) to M.B.M.
We thank Karl Bettelheim for providing strains and the anonymous referees for comments and suggestions.
Editor: V. J. DiRita