Recent evidence suggests that non-O157 strains are responsible for an increasing number of human infections and that the role of these strains in causing diarrheal disease may be just as important as that of O157 strains (
15). Obtaining true prevalence estimates for non-O157 infections is difficult due to the lack of mandatory characterization of isolates, loss of isolates due to the use of nonculture diagnostics, and also the lack of standardized testing procedures. The aim of this study was to compare the O-antigen gene cluster sequences of O26, O45, O103, O111, O121, and O145
E. coli strains to explore possible nucleotide associations with serogroups and pathotypes.
We found that there was a substantial amount of O-antigen gene cluster variation both within and between O serogroups. The degree of variation detected within and between O-antigen genes of
E. coli was not unexpected due to lateral gene transfer events that are often the result of environmental pressures and the necessity to adapt for survival (
36,
49). A complete genome comparison study between an
E. coli O157:H7 strain isolated from the Sakai outbreak and a benign lab strain of
E. coli, K-12 MG1655, found that the two strains shared a 4.1-Mb sequence that was believed to be the
E. coli backbone; the remaining 1.4 Mb consisted of O157:H7-specific sequences that appeared to be foreign DNA acquired through lateral gene transfer (
36). Another genome comparison study on O26, O111, and O103
E. coli strains found a large number of strain-specific genes; however, virulence genes were well conserved between strains (
49). The conservation of virulence genes between strains may be explained by the results of another study which has shown through phylogenetic analysis that pathogenic strains of
E. coli have evolved and acquired virulence plasmids in parallel (
54). Dendrogram analyses of our strains also support this hypothesis. We found that the majority of the non-STEC strains had a different lineage than the STEC strains within the O serogroups ( to ); however, STEC strains were more closely related to non-STEC strains within the same O serogroup than to STEC strains in the other O serogroups (data not shown).
Gene differences in the O-antigen gene clusters of
E. coli strains have been used to create PCR-based assays for detection of specific non-O157 strains (
24,
28). PCR assays were developed that were specific to the O26 and O103 serogroups and also were able to detect O26 and O103 strains in apple juice (
24). One of the major differences between previous PCR-based assay targets and the polymorphisms found in this study was the ability to differentiate between STEC and non-STEC strains. Many strains of non-O157
E. coli are not considered pathogenic, and it will be important for the meat industry when responding to the mandatory testing requirements beginning in 2012 to be able to differentiate between regulated disease-causing non-O157 STEC strains and non-STEC strains that only share the same O-antigen genes.
We identified polymorphisms in a collection of E. coli strains from each of the six O-serogroups that were not only specific to the O serogroup but also, to a great extent, unique to STEC strains. There were several false negatives and false positives associated with the identified STEC-associated alleles for the strains sequenced in our collection; these included five false positives for the O26 strains, two false positives for the O103 strains, four false positives for the O111 strains, and two false negatives and one false positive for the O121 strains. However, 100% sensitivity or specificity for any STEC-associated allele would be highly unlikely due to the evolutionary nature of E. coli and lateral gene transfer. It is interesting to note that the two false-negative O121 strains (which contained stx1 but did not have the STEC-associated allele) did not contain eae or ehxA and were positive for the H7 antigen. This is in contrast to the majority of O121 STEC strains, which are positive for stx2 and the H19 antigen. All but one of the false-positive strains (i.e., had the STEC-associated allele but did not contain stx) in our sequencing panel contained eae.
Research is ongoing, and questions still remain unanswered as to what genes are necessary to confer virulence in humans (
3,
5,
67). Research has shown that
stx is associated with severe disease and hemolytic-uremic syndrome (HUS) (
34), whereas
eae is associated with attachment of the bacteria to the epithelium (
1). Several studies have shown a significant association between
stx2 and
eae and severe clinical disease (
11,
27,
64). The O121 strains sequenced in this study may have recently acquired
stx1 or lost
eae and
ehxA. Another study conducted in Germany found that EPEC strains associated with bloody diarrhea and cases of HUS were closely related to STEC strains and clustered in the same multilocus sequence typing (MLST) clonal complexes (
7). The authors hypothesized that these EPEC strains were originally STEC strains that lost
stx once infection was established in the patients. Other studies have suggested that STEC strains evolved from EPEC strains by acquiring
stx (
54,
66). In the absence of genes encoding alternate mechanisms for colonizing hosts, STEC strains that do not contain
eae or
ehxA may be reduced in virulence and less clinically important. These strains may have once had the ability to cause severe disease but due to the loss of attachment genes may now be unable to sustain infection in a human host. The differences in the H antigen between the
stx1 and
stx2 strains also support the alternative hypothesis that an
stx1 phage infected a particular lineage of O121 strains.
The five false-positive O26 strains (which have the STEC allele but do not contain
stx) are also of particular interest because four of them contain
eae (). Research has been conducted on the ability of O26 aEPEC strains to acquire
stx-carrying phages and O26 EHEC strains to lose
stx
in vitro (
8). It has been found that
stx-carrying bacteriophages facilitate the bidirectional conversion between O26 aEPEC and EHEC pathotypes. Bugarel et al. identified
espK as a unique genetic marker for EHEC and EHEC derivative strains (EHEC strains that have lost
stx) (
18). The
espK gene was present in three of the four false-positive strains that contained
eae. Two of these strains were found in cluster 4 and had the STEC alleles for polymorphisms
rmlA 30 G→T and
fnl1 88 G→A. The third strain was found in cluster 2 and contained the STEC allele for polymorphism
wzx 953 G→T. Interestingly, this strain also contained
nleB, whereas the other false-positive strain in cluster 2 did not contain
espK or
nleB. The finding of
espK in the O26 false-positives would indicate that these were EHEC strains that lost their Shiga toxin-encoding genes.
Overall, the MALDI-TOF assays used to determine the frequency of the STEC alleles proved to accurately detect STEC strains within serogroups in this study (). The panel of bacterial strains used to validate the MALDI-TOF assays was independent from the bacterial strains used in the sequencing panel. Due to the complex relationship between the presence of virulence genes and the ability to cause human infection, we estimated the sensitivity and specificity for each of the assays using three different virulence gene classification groups. The first estimate classified all strains with stx alone as a true positive, the second estimate classified all strains that contained stx with eae as a true positive, and the third estimate classified all strains that contained stx with both eae and ehxA as a true positive. The different virulence gene classification groups only affected the sensitivity and specificity estimates for the O serogroups with a large diversity of strains, which included the O26, O45, O111, and O121 strains. Sensitivity and specificity estimates for classification with stx alone and stx with eae did not vary dramatically for the O111 and O26 groups because the majority of the strains with stx also contained eae. The O121 group and several of the assays in the O45 group showed an increased sensitivity for stx with eae and stx with eae and ehxA classifications because of strains with stx that did not contain eae or ehxA and did not have the STEC-associated allele ().
Overall, the sensitivity estimates of the 21 assays were high, except for the assay targeting the O103 wbtD 937 C→T polymorphism (75.2%) (). The low sensitivity was a result of 25 O103:H25 strains that did not have the STEC-associated allele and were classified as false negatives. Interestingly, two of the O103:H25 strains, one of which was included in the SNP discovery set, did have the STEC-associated allele. Additional sequencing of O103:H25 strains will be needed to determine whether an alternate or additional SNP is needed to incorporate O103:H25 STEC strains.
The majority of the assays had a high specificity, except for the O26
rmlA 30 G→T (with
wzx 953 T→G) (56.2%) and O111
wbdH 1006 G→A (44.9%) assays (). The low specificity estimates were a result of the large number of false positives. A large number of non-O111 or non-O26
E. coli strains as well as
Salmonella strains contained the O26 or O111 STEC-associated alleles for these two assays. One of the explanations for the large number of false positives may be the close relationship between
E. coli and
Salmonella. The polymorphisms in these two assays may be contained within a region that is highly conserved both between
E. coli serogroups and between
E. coli and
Salmonella. Other studies have also found a high degree of similarity in the genetic sequences of the O antigens in
E. coli and
Salmonella enterica (
37,
43).
Thirty-five (21.3%) of the strains sequenced in this study did not contain
stx but did contain
eae and were classified as EPEC, and the majority were O26 or O111 strains. Dendrogram analysis revealed some interesting patterns in regard to potential virulence factors and EPEC versus STEC strains ( and ). The role of these strains in human illness is not fully understood; however, from the dendrogram analysis it appears that distinct lineages have evolved, perhaps through environmental or selective pressures to promote survival of the bacteria. In the O111 and O26 dendrograms, there was a distinct lineage separation, with the majority of the STEC and EPEC strains clustered separately. The majority of the O111 EPEC strains were typical and contained
nleB (), but the O26 EPEC strains were atypical and only a few contained
nleB (). In contrast to
nleB, which was found only in the EPEC strains in the O26 and O111 serogroups,
efa1 (
lifA) was found in the majority of both the EPEC and STEC strains.
nleB and
efa1 (
lifA) were found on the same pathogenicity island (OI-122); however, other studies have also reported variation in the carriage of these genes between EPEC strains and the existence of two main variants of the OI-122 pathogenicity island (
3). It was also interesting to note that
nleB was found in the STEC strains of the O45, O103, and O145 serogroups (, , and ) and was absent in all of the O121 strains (). Different selective pressures and evolutionary niches may be responsible for differences in the carriage of
nleB in O26 and O111 EPEC strains versus O45, O103, and O145 STEC strains. In contrast to the results found in this study, Bugarel et al. detected
nleB in EHEC strains belonging to the O26:H11, O103:H2, O111:H8, O121:H19, and O145:H28 serotypes (
16,
17). The primary reason for the contrasting results is most likely the different sample populations. Bugarel et al. investigated strains from the National Reference Laboratory for
E. coli at the Federal Institute for Risk Assessment in Berlin, Germany, and the French Food Safety Agency in Maisons-Alfort, France. The strains investigated in this study were primarily from sources in the United States. It is not unusual to see differences in strain carriage across geographic regions or across source demographics.
Another group of interest is highlighted in green in the O111 dendrogram (). Strains in this group had identical O-antigen gene sequences and virulence gene profiles and are mostly serotype O111:H11. It is interesting that the majority of these strains grouped separately from the other O111 STEC strains, because the H11 antigen strains are primarily found in bovine hosts and do not typically cause disease in humans. The STEC serotype O111:H8 strains highlighted in red () are more commonly associated with human illness (
14,
19). These STEC O111:H11 strains contained the same polymorphisms as the O111:H8 STEC strains but also had two additional unique polymorphisms.
The presence of particular virulence genes does not appear to be host specific. In the O111 serogroup,
nleB was found mostly in strains derived from humans, whereas in the O26 serogroup,
nleB was found in environmental strains (any strain from a nonhuman source). In the O111 and O26 serogroups,
ehxA was found in strains originating from both human and environmental sources. However, in order to further explore relationships between the genes and potential host-specific factors, a larger and more diverse sample of strains is needed. The majority of the O26 EPEC strains were environmental, and the majority of both the O111 EPEC and STEC strains were from human sources. This may be the reason we did not see any host-specific gene profiles and why all the typical EPEC strains were O111 strains. Other studies have found that typical EPEC strains are most commonly isolated from human sources and not found in bovine sources (
68).
The polymorphisms discovered in this study were unique, because not only do they differentiate between the six O serogroups but also they are associated with STEC strains. Recent outbreaks of non-O157 STEC in the United States and Europe have increased awareness of these strains and highlighted the need to develop accurate tests for identification. Discussions on public health and prevention have led to the development of regulations in the meat industry concerning testing for certain non-O157 STEC strains. In order to prevent unnecessary loss of nonintact raw beef products and revenue, it will be essential to have tests available that are both fast and accurate. The polymorphisms presented in this study can be used to develop tests that should facilitate the identification of O26, O45, O103, O111, O121, and O145 STEC strains. The methods applied in this study can also be used to identify potential STEC-associated alleles in other non-O157 STEC serogroups. Sequencing of additional strains in the six serogroups presented in this article as well as additional serogroups will allow us to have a greater confidence in our sensitivity estimates, further understand the evolution of these strains, and potentially answer questions regarding the role of particular virulence genes in pathogenicity.