The acquisition and loss of genetic elements in E. coli
O157:H7 is thought to affect the virulence of this pathogen in humans [17
]. While attributes like the Stxs [20
] and the locus of enterocyte attachment and effacement (LEE) [23
] are well known, other less well characterized elements such as non-LEE effectors (NLE)s [24
] may contribute to the spectrum of virulence that has been captured for STEC within the seropathotype classification [25
This genomic diversity can largely be attributed to bacteriophages [4
] and other mobile elements [27
] that cause DNA segment insertion and deletion events, rather than single-nucleotide changes [28
]. The best characterization of diversity within E. coli
O157:H7 must therefore take both single-nucleotide changes and large region turnover events into consideration. Whole genome sequencing is the best method for such analysis, but until the number of completely sequenced strains increases and whole genome sequencing becomes both routine and cost-effective, an estimation of whole genomic change based on sampling of polymorphisms or variability at a number of specific loci will have to suffice. A number of molecular genotyping methods have recently emerged that target different regions and types of variation, so the combination of data from these methods can offer a picture that is greater than the sum of their individual views. It can be argued that lateral gene transfer obscures the relationship among bacterial strains; however, once mobile elements are acquired and if they are stably maintained, they can be especially valuable in assessing strain relationships. We therefore advocate the use of genotyping methods that rely on multiple spatially distinct loci to provide a robust view of the O157:H7 population structure. Each individual multi-locus method in our study pointed to a similar tree structure and the combination of methods in the supernetwork (Figure ) showed three distinct lineages, two of which were originally proposed by Kim et al. [6
] and the third by Zhang et al. [7
], adding confidence to the conclusion that these lineages exist.
The results of SNP genotyping over 500 O157:H7 isolates found hyper-virulent clade 8 strains, the causative agents in the spinach- and lettuce-related outbreaks of 2006 in the United States, to be most closely related to sorbitol fermenting E. coli
O157:NM strains [12
]. Based on our in silico
analyses, these clade 8 strains, which appear to be increasing in prevalence and have a greater association with HUS than other strains [12
], are members of lineage I/II (Table ). Interestingly, E. coli
O157:H7 strain TW14588 was designated as clade 8 in the initial publication on variation in virulence among the clades [12
], but all in silico
analyses conducted during this study on the whole-genome shotgun sequence available from GenBank (NZ_ABKY) suggests it belongs to clade 2 (lineage I).
It must be recognized that the publicly available genome sequences are not error-free and that this could have affected the architecture of trees that are highly dependent on single nucleotide changes in sequences of target genes, such as SNP-genotyping. However, the in silico
tree architecture was very similar to that described by Manning et al. [12
] based on experimental data; therefore we suspect that this was not a significant source of error in this study.
Differences between lineage I and lineage II strains have been described [29
], but much less is known about lineage I/II strains with respect to host/disease association or expression of virulence attributes. We have recently demonstrated in a study of E. coli
O157:H7 strains in Canada that certain phage types (PT)s are specific to O157:H7 lineages [29
]. In the latter study, PT2 strains were shown to belong exclusively to lineage I/II, however, others strains in lineage I/II belonged to PTs that were not lineage-restricted e.g. PT23, PT8 and PT1, suggesting that this lineage is widespread and also diverse. Such diversity was also apparent in the examination of the novel regions of O157:H7 DNA, presented in Figure .
The Stxs, which are bacteriophage-encoded and the primary virulence factors of E. coli
] show differential distribution among the lineages. The stx2
gene was found in nearly all O157:H7 lineage I and lineage I/II strains [12
], while the stx1
gene was absent in lineage I/II strains but present in nearly all other strains studied. Additionally, Ziebell et al. [29
] found the stx2c
gene in 96.7% of lineage II strains, 50.0% of lineage I/II strains and 1.8% of lineage I strains, while Manning et al. [12
] found stx2c
to be present in 57.6% of clade 8 strains. This SNP genotyping study also found a significant relationship between the presence of stx2
in conjunction with stx2c
among strains of clade 8, in that no other clade associated with human illness displayed this combination. This is not surprising as lineage I strains only rarely contain stx2c
despite their high association with human disease and lineage II strains nearly always contain stx2c
despite a rare association with human disease [6
]. It is unlikely that the combination of stx2
alone is the reason for the hyper-virulence of lineage I/II strains, as the presence of stx2c
is nearly ubiquitous among bovine-associated lineage II strains. The SNP genotyping study cited above only considered isolates associated with human disease; therefore it is likely that few lineage II strains were included in the study. A study by Friedrich et al. [32
] examining stx2
subtypes and their association with clinical symptoms found stx2c
to be the only subtype besides stx2
present in strains isolated from cases of HUS, but found no correlation between the presence of stx2c
and the development of HUS. It has recently been shown that the level of Stx2
production is greater in lineage I strains than lineage II strains [33
] so it may be that lineage I/II strains implicated in cases of HUS simply produce more toxin than other O157:H7 strains. However, it is possible that other factors possessed by the hyper-virulent lineage I/II group strains are responsible for their greater virulence in humans, and remain to be discovered.
The findings in this study highlight the need for a common genotyping approach, as it is evident that the same groups of genetically related strains have been given multiple designations based on the use of different comparative genotyping methods. This need exists for epidemiological studies of outbreak strains, where strain discrimination is the primary focus, as well as for population genetic studies where genomic information is of central importance. The value of being able to compare a pattern produced in a particular laboratory to one found in a national central database has made PFGE and PulseNet very useful in tracking outbreaks that are widely disseminated [34
], despite the fact that PFGE is labour intensive and difficult to standardize [35
]. As this common approach in identifying outbreak strains has been useful for a pattern-based method such as PFGE, approaches based on a multi-locus sampling of the genome could take this centralized database concept and extend it to contain presence/absence data for specific loci, SNPs, and other measures of heterogeneity. In this way, whether the goal is identifying an outbreak source or using information for a population-based study, the data would be available in a central repository. This type of system would be important in monitoring and identifying the emergence of new clones of O157:H7, such as the hyper-virulent lineage I/II/clade 8 strains and recognizing other changes in the population when genotyping E. coli
O157:H7 strains associated with disease outbreaks.
The availability of whole genome sequence information has led to the development of new genotyping methods that are easier to perform than traditional methods, more discriminatory and more informative with respect to genotype and phenotype. It is interesting that the approaches targeting genetic polymorphisms within conserved genes and those targeting genetic changes based on gene insertion/deletion events converge to give a similar picture of E. coli
O157:H7 strain relationships. Such concordance of methods has been previously demonstrated with mCGH and MLST for Campylobacter jejuni
] and Streptococcus pneumoniae
]. However, given that methods differ greatly in terms of the time required for the analysis, labour and equipment required, need for expertise, freedom from subjectivity in interpretation of the data and portability of the genotyping results from one laboratory to another, there is considerable advantage in selecting a method that is simple, extensible and easily portable. While some typing methods are better than others in these aspects, most of the multi-locus typing methods examined produced a similar tree architecture. This suggests a "common typing language" is possible at least in the context that genotypes derived using different methods can be integrated and communicated in a broader framework. While it was shown that multiple methods converge to provide a similar picture of the O157:H7 population structure, we are not advocating the routine use of multiple genotyping methods but rather the use of methods based on comparative genomics.
With the genomic sequencing revolution well under way, the ability to harvest novel sequence information in a timely fashion from new genomic sequences will become increasingly important and the ability to include and compare in silico results to those from traditional laboratory experiments will become necessary.
The results of this study suggest that genotyping approaches based on common comparative genomic data are likely to form the basis for the next-generation of analytical tools used for both population-based comparative genotyping and epidemiological studies.