Comparative analysis of genomes from 17 commensal and pathogenic
E. coli strains has revealed a diverse species 'pan-genome', while the
E. coli 'core conserved' genome was calculated to be about one-half of the genome of a given
E. coli isolate [
47]. Although EHEC utilize similar virulence mechanisms, this pathotype is comprised of phylogenetically distinct lineages that vary in their ability to cause disease in both humans and animals. Clearly, the genome of a single strain cannot reflect how the genomic diversity among EHEC strains influences pathogenesis of the EHEC population. Because no strains from the EHEC 2 clonal group have been sequenced, the genetic variability of 24 EHEC 2 strains were examined in relation to the distribution of genes from O157:H7 Sakai, which belongs to the EHEC 1 clonal group. The Sakai genome was used in this study, as its annotation is suggested to include more strain-specific genes compared to EDL933 [
47]. Genes specific to the EHEC 2 group have yet to be described. Some genes shared with Sakai might have been missed in our study, if the gene sequence had diverged to a point where the 70-mer oligonucleotide probes and the stringency of competitive hybridization preclude detection. Although this study allowed screening of known genes only, the gene content data still offered new insight on strain relatedness and the distribution and subsequent diversification of mobile elements within the EHEC 2 clonal group.
The CGH data presented here indicate that there are two distinct trends, which reflect the bacterial (vertical) and phage (lateral) origin of genes, impacting the genomic divergence of EHEC 2. Virtually the entire set of backbone genes was present within the EHEC 2 clonal group (Tables and ). CGH inferences pertaining to the distribution of backbone genes can vary depending on array type, sample size, and strain diversity [
46]. For example, Anjum
et al. have proposed that the O26 serogroup exhibits greater genetic homogeneity than was observed in our study [
48]; however, the microarray platform used in that study was limited to the genome of K-12 MG1655. Despite these differences, the degree of conservation among backbone genes in this CGH investigation was similar in previous studies [
46,
49,
50]. The distribution of Sakai-specific genes in EHEC 2 was, not surprisingly, noticeably lower than that of the backbone, which restates established findings about intraspecies genomic variability [
40,
51,
52]. The conservation of Sakai phage genes was, however, found to be more than 2-fold higher when compared to Sakai bacterial genes (Figure and Table ). In O55:H7, the inferred ancestor of O157:H7 [
53], the proportion of Sakai phage to bacterial gene conservation was opposite from the proportion observed in EHEC 2; this suggests that Sakai bacterial genes have been vertically acquired from the O55:H7 progenitor and are not disseminated among the EHEC 2 clone. Cursory assessment of K-12-specific genes suggests a homogenous distribution in EHEC 2, with less than half of the genes present; most K-12 phage-related genes were found to be uniformly divergent/absent from the entire EHEC 2 population (Additional file
4). Assessing the conservation of K-12 specific genes was, however, beyond the scope of this study, as K-12 MG1655 is a non-pathogenic laboratory-derived strain that is distantly related to EHEC (Figure ).
The increased presence of Sakai phage genes in the EHEC 2 group compared to Sakai bacterial genes reveals independent acquisition and exchange of similar mobile elements. For example, of the 152 Sakai-specific genes present in EHEC 2, only 26 genes were not found in 11 completed non-EHEC
E. coli and
Shigella spp. genomes. About one-half of the 26 "EHEC only" genes were found in
stx1-encoding phages BP-4795 and CP-1639 from STEC O84:H11 and O111:H-, respectively [
54,
55]. Sakai genes identified by BLASTN as present on BP-4795 are disseminated on phages Sp6, 9, 10, and 12, which is in agreement with the evidence for recombination between phages [
56]. Although the number of phage genes shared by all tested strains was low, the percentage of those that were VAP was high (Table ), which may reflect sequence heterogeneity in prophage genomes with similar modular structures [
54,
56,
57], and not true absence of genes.
Phylogenetic network analysis implied a serotype-specific uniformity of O111:H8 strains, unlike other EHEC 2 strains (Figure ), which can also be inferred from the arrangement of Sakai phage genes in O111:H8 strains (Figure ). Interestingly, these six EHEC 2 representatives are the only strains with the θ intimin allele while the remaining eighteen EHEC 2 strains had β intimin, as determined by PCR-based RFLP typing of
eae; the method for
eae typing was described previously [
58]. By contrast, members of the EHEC 1 clonal group (i.e., O157:H7 and O55:H7) typically had the γ allele. Although intimin θ has been found in an atypical EPEC O55:H7 and a non-EHEC 2 strain (GenBank Acc. No.
AJ833638 and
AF253561), O111:H8 is, to our knowledge, the only EHEC 2 serotype with this intimin allele, providing further support for the hypothesis that O111:H8 represents a distinct grouping.
Based on the distinguishing distribution of Sakai genes (Figures and ), serotype O26:H11 appears to be considerably more diverse compared to the distinct and more uniform O111:H8. This suggests that the genetic make-up of O26:H11 is such that it allows more frequent lateral exchange of DNA elements, which can result in acquisition of novel fitness and virulence genes by O26:H11 more commonly than by other EHEC 2. For example, O26:H11 possess the
Yersinia spp. high pathogenicity island (HPI) that encodes the iron-uptake siderophore yersiniabactin and the pesticin receptor, whereas other EHEC serotypes, including O157:H7, O111:H-, O103:H2, and O145:H-, do not have this HPI [
59]. The diversity of O26:H11/H- has also been implied with other methods [
60].
A proportion of the EHEC 2 hybridization data (15% of the PI genes) were identified as genes that are phylogenetically compatible with each other, i.e., having no homoplasy. Although this represents a small number of genes, it is remarkable that the distribution pattern grouped EHEC 2 O111:H8 and O118:H16 strains by serotype (Figure ). The pathogenic
E. coli used in this study represent tips of phylogenetic branches, where high frequencies of recombination strongly impact the shaping of genomic content [
61] and eventually lead to erosion of the phylogenetic signal between clonal complexes [
62]. Thus, the set of genes shared with EHEC 1 O157:H7 whose pattern of presence and absence in EHEC 2 infers compatibility and is not random, but coincides with serotype, warrants further investigation.
The heterogeneity of Stx phages has been demonstrated [
57,
63], even within the O157:H7 lineage itself [
64,
65], so it is not unexpected to find such variation between different EHEC 2 strains. In addition, Ogura
et al. propose that Stx phages have alternative integration sites in EHEC 2 [
46]; this may explain our lack of detection of integrase genes, as integration site specificity is dependent on the alignment of the phage integrase with the attachment sequence in the bacterial chromosome [
66]. Strains that were
stx negative in our study were, nevertheless, found to carry genes from the Sp15 and Sp5 phages, which is a common effect of frequent modular shuffling of sequences between phages of related enteric hosts [
56,
67,
68]. The significance of the unique conservation patterns of Sp10 and Sp18 phage genes is not clear. Sp10 is perhaps more conserved as it harbors non-LEE effector genes [
42], all 3 of which were detected in at least 22 out of 24 EHEC 2 strains. Absence of the entire Sp18 was also detected among O157:H7 strains [
65], one of which belongs to a hyper-virulent lineage of the O157:H7 population [
69].
Incongruent divergence of LEE operons has been previously suggested. Studies indicate that this island is a dynamic region [
70], and that different selective pressures act on different parts of the LEE [
71]. The sequence diversity of the LEE, both at the nucleotide and amino acid level, increases along the length of the island from the LEE1 to the LEE4 operon [
71,
72]. A comparable trend can be observed in the CGH data presented here, as there was greater conservation of the content of genes that encode the secretion apparatus (LEE1–3). However, differences in the content of O157:H7 Sakai LEE genes between human and animal EHEC 2 strains of the same serotype (Figure and Table ) suggest that the LEE has diverged between EHEC 2 strains in a host dependent manner, possibly due to host species adaptive pressure. This result was not expected and its implications are not supported by the current literature. Multiple, parallel acquisitions of the LEE by different clonal groups have been inferred [
37,
73-
75].
Muniesa
et al. suggest that the LEE genes associated with serogroup O26 are present more commonly in STEC than the LEE genes associated with EHEC O157:H7 or EPEC O127:H6 [
76]. Yet, there is no clear evidence to support the hypothesis that LEE divergence within a lineage results from positive adaptive pressure in different host species. In fact, when several LEE genes from strain RDEC-1 were compared to those from other AEEC, the variation appeared to be associated with evolutionary lineage and not host specificity [
77]. Even so, given the heterogeneous diversification of this island and the recent inference about host-specific expression of
espA and
eae in O157:H7 [
78], it would be interesting to compare complete LEE sequences from a larger sample of EHEC 2 strains of human and animal origin.