Salmonella enterica serovar Typhimurium definitive type 2 (DT2) is host restricted to Columba livia (rock or feral pigeon) but is also closely related to S. Typhimurium isolates that circulate in livestock and cause a zoonosis characterized by gastroenteritis in humans. DT2 isolates formed a distinct phylogenetic cluster within S. Typhimurium based on whole-genome-sequence polymorphisms. Comparative genome analysis of DT2 94-213 and S. Typhimurium SL1344, DT104, and D23580 identified few differences in gene content with the exception of variations within prophages. However, DT2 94-213 harbored 22 pseudogenes that were intact in other closely related S. Typhimurium strains. We report a novel in silico approach to identify single amino acid substitutions in proteins that have a high probability of a functional impact. One polymorphism identified using this method, a single-residue deletion in the Tar protein, abrogated chemotaxis to aspartate in vitro. DT2 94-213 also exhibited an altered transcriptional profile in response to culture at 42°C compared to that of SL1344. Such differentially regulated genes included a number involved in flagellum biosynthesis and motility.
Whereas Salmonella enterica serovar Typhimurium can infect a wide range of animal species, some variants within this serovar exhibit a more limited host range and altered disease potential. Phylogenetic analysis based on whole-genome sequences can identify lineages associated with specific virulence traits, including host adaptation. This study represents one of the first to link pathogen-specific genetic signatures, including coding capacity, genome degradation, and transcriptional responses to host adaptation within a Salmonella serovar. We performed comparative genome analysis of reference and pigeon-adapted definitive type 2 (DT2) S. Typhimurium isolates alongside phenotypic and transcriptome analyses, to identify genetic signatures linked to host adaptation within the DT2 lineage.
The Red Queen hypothesis proposes that coevolution of interacting species (such as hosts and parasites) should drive molecular evolution through continual natural selection for adaptation and counter-adaptation1–3. Although the divergence observed at some host-resistance4–6 and parasite-infectivity7–9 genes is consistent with this, the long time periods typically required to study coevolution have so far prevented any direct empirical test. Here we show, using experimental populations of the bacterium Pseudomonas fluorescens SBW25 and its viral parasite, phage Φ2 (refs 10, 11), that the rate of molecular evolution in the phage was far higher when both bacterium and phage coevolved with each other than when phage evolved against a constant host genotype. Coevolution also resulted in far greater genetic divergence between replicate populations, which was correlated with the range of hosts that coevolved phage were able to infect. Consistent with this, the most rapidly evolving phage genes under coevolution were those involved in host infection. These results demonstrate, at both the genomic and phenotypic level, that antagonistic coevolution is a cause of rapid and divergent evolution, and is likely to be a major driver of evolutionary change within species.
The current Shigella sonnei pandemic involves geographically associated, multidrug-resistant clones. This study has demonstrated that S. sonnei phylogeny can be accurately defined with limited single nucleotide polymorphisms (SNPs). By typing 6 informative SNPs using a high-resolution melting (HRM) assay, major S. sonnei lineages/sublineages can be identified as defined by whole-genome variation.
In 2009, an outbreak of enterohemorrhagic Escherichia coli (EHEC) on an open farm infected 93 persons, and approximately 22% of these individuals developed hemolytic-uremic syndrome (HUS). Genome sequencing was used to investigate outbreak-derived animal and human EHEC isolates. Phylogeny based on the whole-genome sequence was used to place outbreak isolates in the context of the overall E. coli species and the O157:H7 sequence type 11 (ST11) subgroup. Four informative single nucleotide polymorphisms (SNPs) were identified and used to design an assay to type 122 other outbreak isolates. The SNP phylogeny demonstrated that the outbreak strain was from a lineage distinct from previously reported O157:H7 ST11 EHEC and was not a member of the hypervirulent clade 8. The strain harbored determinants for two Stx2 verotoxins and other putative virulence factors. When linked to the epidemiological information, the sequence data indicate that gross contamination of a single outbreak strain occurred across the farm prior to the first clinical report of HUS. The most likely explanation for these results is that a single successful strain of EHEC spread from a single introduction through the farm by clonal expansion and that contamination of the environment (including the possible colonization of several animals) led ultimately to human cases.
The immunodominant lipopolysaccharide is a key antigenic factor for Gram-negative pathogens such as salmonellae where it plays key roles in host adaptation, virulence, immune evasion, and persistence. Variation in the lipopolysaccharide is also the major differentiating factor that is used to classify Salmonella into over 2600 serovars as part of the Kaufmann-White scheme. While lipopolysaccharide diversity is generally associated with sequence variation in the lipopolysaccharide biosynthesis operon, extraneous genetic factors such as those encoded by the glucosyltransferase (gtr) operons provide further structural heterogeneity by adding additional sugars onto the O-antigen component of the lipopolysaccharide. Here we identify and examine the O-antigen modifying glucosyltransferase genes from the genomes of Salmonella enterica and Salmonella bongori serovars. We show that Salmonella generally carries between 1 and 4 gtr operons that we have classified into 10 families on the basis of gtrC sequence with apparent O-antigen modification detected for five of these families. The gtr operons localize to bacteriophage-associated genomic regions and exhibit a dynamic evolutionary history driven by recombination and gene shuffling events leading to new gene combinations. Furthermore, evidence of Dam- and OxyR-dependent phase variation of gtr gene expression was identified within eight gtr families. Thus, as O-antigen modification generates significant intra- and inter-strain phenotypic diversity, gtr-mediated modification is fundamental in assessing Salmonella strain variability. This will inform appropriate vaccine and diagnostic approaches, in addition to contributing to our understanding of host-pathogen interactions.
Bacterial pathogens frequently evolve mechanisms to vary the composition of their surface structures. The consequence is enhanced long-term survival by facilitating persistence and evasion of the host immune system. Salmonella sp., cause severe infections in a range of mammalian hosts and guard themselves with a protective coat, termed the O-antigen. Through genome sequence analyses we found that Salmonella have acquired an unprecedented repertoire of genetic sequences for modifying their O-antigen coat. There is strong evidence that these genetic factors have a dynamic evolutionary history and are spread through the bacterial population by bacteriophage. In addition to this genetic repertoire, we determined that Salmonella can and often do employ stochastic mechanisms for expression of these genetic factors. This means that O-antigen coat diversity can be generated within a Salmonella population that otherwise has a common genome. Our data significantly enhance our appreciation of the genetic and regulatory characteristics underpinning Salmonella O-antigen diversity. The role attributed to bacteriophage in generating this diversity highlights that Salmonella are acquiring an extensive repertoire of O-antigen modifying traits that may enhance the pathogen's ability to persist and cause disease in mammalian hosts. Such genetic traits may make useful markers for defining new epidemiological and diagnostic tools.
Chlamydia psittaci is the etiological agent of psittacosis and is a zoonotic pathogen infecting birds and a variety of mammalian hosts. Here we report the genome sequence of the porcine strain 01DC12 which is representative of a novel clade of C. psittaci belonging to ompA genotype E.
Integrative and conjugative elements (ICEs) are self-mobile genetic elements found in the genomes of some bacteria. These elements may confer a fitness advantage upon their host bacteria through the cargo genes that they carry. Salmonella pathogenicity island 7 (SPI-7), found within some pathogenic strains of Salmonella enterica, possesses features indicative of an ICE and carries genes implicated in virulence. We aimed to identify and fully analyze ICEs related to SPI-7 within the genus Salmonella and other Enterobacteriaceae. We report the sequence of two novel SPI-7-like elements, found within strains of Salmonella bongori, which share 97% nucleotide identity over conserved regions with SPI-7 and with each other. Although SPI-7 within Salmonella enterica serovar Typhi appears to be fixed within the chromosome, we present evidence that these novel elements are capable of excision and self-mobility. Phylogenetic analyses show that these Salmonella mobile elements share an ancestor which existed approximately 3.6 to 15.8 million years ago. Additionally, we identified more distantly related ICEs, with distinct cargo regions, within other strains of Salmonella as well as within Citrobacter, Erwinia, Escherichia, Photorhabdus, and Yersinia species. In total, we report on a collection of 17 SPI-7 related ICEs within enterobacterial species, of which six are novel. Using comparative and mutational studies, we have defined a core of 27 genes essential for conjugation. We present a growing family of SPI-7-related ICEs whose mobility, abundance, and cargo variability indicate that these elements may have had a large impact on the evolution of the Enterobacteriaceae.
Understanding the survival of resistance plasmids in the absence of selective pressure for the antibiotic resistance genes they carry is important for assessing the value of interventions to combat resistant bacteria. Here, several poorly explored questions regarding the fitness impact of IncP1 and IncN broad host range plasmids on their bacterial hosts are examined; namely, whether related plasmids have similar fitness impacts, whether this varies according to host genetic background, and what effect antimicrobial resistance gene silencing has on fitness.
For the IncP1 group pairwise in vitro growth competition demonstrated that the fitness cost of plasmid RP1 depends on the host strain. For the IncN group, plasmids R46 and N3 whose sequence is presented for the first time conferred remarkably different fitness costs despite sharing closely related backbone structures, implicating the accessory genes in fitness. Silencing of antimicrobial resistance genes was found to be beneficial for host fitness with RP1 but not for IncN plasmid pVE46.
These findings suggest that the fitness impact of a given plasmid on its host cannot be inferred from results obtained with other host-plasmid combinations, even if these are closely related.
Salmonella enterica is an animal and zoonotic pathogen of worldwide importance and may be classified into serovars differing in virulence and host range. We sequenced and annotated the genomes of serovar Typhimurium, Choleraesuis, Dublin, and Gallinarum strains of defined virulence in each of three food-producing animal hosts. This provides valuable measures of intraserovar diversity and opportunities to formally link genotypes to phenotypes in target animals.
RNA sequencing provides a new perspective on the genome of Mycobacterium tuberculosis by revealing an extensive presence of non-coding RNA, including long 5’ and 3’ untranslated regions, antisense transcripts, and intergenic small RNA (sRNA) molecules. More than a quarter of all sequence reads mapping outside of ribosomal RNA genes represent non-coding RNA, and the density of reads mapping to intergenic regions was more than two-fold higher than that mapping to annotated coding sequences. Selected sRNAs were found at increased abundance in stationary phase cultures and accumulated to remarkably high levels in the lungs of chronically infected mice, indicating a potential contribution to pathogenesis. The ability of tubercle bacilli to adapt to changing environments within the host is critical to their ability to cause disease and to persist during drug treatment; it is likely that novel post-transcriptional regulatory networks will play an important role in these adaptive responses.
Tuberculosis bacteria are able to hide quietly inside the body for years or decades before reawakening to cause disease. If we knew more about how the bacteria change from a harmless persistent form to an aggressive disease-causing form, we could develop drugs that would be more effective in treating active tuberculosis and may also allow us to eliminate the infection before it erupts into disease. The key to this is in knowing how the bacteria determine which of their genes to express at different times. By applying modern sequencing technologies we have discovered a new putative network of gene regulation in Mycobacterium tuberculosis that is based on RNA molecules rather than protein molecules. We anticipate that this finding will open the way for new research that will allow us to understand the fundamental mechanisms underlying this deadly human disease, and that will help us to design better tools for prevention and treatment of TB.
Massively parallel sequencing of transposon-flanking regions assigned the genotype and fitness score to 91% of Escherichia coli O157:H7 mutants previously screened in cattle by signature-tagged mutagenesis (STM). The method obviates the limitations of STM and markedly extended the functional annotation of the prototype E. coli O157:H7 genome without further animal use.
We present the first genome sequence of Chlamydophila psittaci, an intracellular pathogen of birds and a human zoonotic pathogen. A comparison with previously sequenced Chlamydophila genomes shows that, as in other chlamydiae, most of the genome diversity is restricted to the plasticity zone. The C. psittaci plasmid was also sequenced.
Pandemic infectious diseases have accompanied humans since their origins1, and have shaped the form of civilizations2. Of these, plague is possibly historically the most dramatic. We reconstructed historical patterns of plague transmission through sequence variation in 17 complete genome sequences and 933 single nucleotide polymorphisms (SNPs) within a global collection of 286 Yersinia pestis isolates. Y. pestis evolved in or near China, and has been transmitted via multiple epidemics that followed various routes, probably including transmissions to West Asia via the Silk Road and to Africa by Chinese marine voyages. In 1894, Y. pestis spread to India and radiated to diverse parts of the globe, leading to country-specific lineages that can be traced by lineage-specific SNPs. All 626 current isolates from the U.S.A. reflect one radiation and 82 isolates from Madagascar represent a second. Subsequent local microevolution of Y. pestis is marked by sequential, geographically-specific SNPs.
Genomic comparisons; SNP typing; phylogeography; neutral evolution; epidemic spread
Legionella pneumophila is a ubiquitous inhabitant of environmental water reservoirs. The bacteria infect a wide variety of protozoa and, after accidental inhalation, human alveolar macrophages, which can lead to severe pneumonia. The capability to thrive in phagocytic hosts is dependent on the Dot/Icm type IV secretion system (T4SS), which translocates multiple effector proteins into the host cell. In this study, we determined the draft genome sequence of L. pneumophila strain 130b (Wadsworth). We found that the 130b genome encodes a unique set of T4SSs, namely, the Dot/Icm T4SS, a Trb-1-like T4SS, and two Lvh T4SS gene clusters. Sequence analysis substantiated that a core set of 107 Dot/Icm T4SS effectors was conserved among the sequenced L. pneumophila strains Philadelphia-1, Lens, Paris, Corby, Alcoy, and 130b. We also identified new effector candidates and validated the translocation of 10 novel Dot/Icm T4SS effectors that are not present in L. pneumophila strain Philadelphia-1. We examined the prevalence of the new effector genes among 87 environmental and clinical L. pneumophila isolates. Five of the new effectors were identified in 34 to 62% of the isolates, while less than 15% of the strains tested positive for the other five genes. Collectively, our data show that the core set of conserved Dot/Icm T4SS effector proteins is supplemented by a variable repertoire of accessory effectors that may partly account for differences in the virulences and prevalences of particular L. pneumophila strains.
This plasmid is disseminated worldwide in Escherichia coli isolated from humans and animals.
Antimicrobial drug resistance is a global challenge for the 21st century with the emergence of resistant bacterial strains worldwide. Transferable resistance to β-lactam antimicrobial drugs, mediated by production of extended-spectrum β-lactamases (ESBLs), is of particular concern. In 2004, an ESBL-carrying IncK plasmid (pCT) was isolated from cattle in the United Kingdom. The sequence was a 93,629-bp plasmid encoding a single antimicrobial drug resistance gene, blaCTX-M-14. From this information, PCRs identifying novel features of pCT were designed and applied to isolates from several countries, showing that the plasmid has disseminated worldwide in bacteria from humans and animals. Complete DNA sequences can be used as a platform to develop rapid epidemiologic tools to identify and trace the spread of plasmids in clinically relevant pathogens, thus facilitating a better understanding of their distribution and ability to transfer between bacteria of humans and animals.
Bacteria; Escherichia coli; antimicrobial drug resistance; extended-spectrum beta-lactamase; CTX-M; plasmid; epidemiology; research
Genome-wide studies of bacterial gene expression are shifting from microarray technology to second generation sequencing platforms. RNA-seq has a number of advantages over hybridization-based techniques, such as annotation-independent detection of transcription, improved sensitivity and increased dynamic range. Early studies have uncovered a wealth of novel coding sequences and non-coding RNA, and are revealing a transcriptional landscape that increasingly mirrors that of eukaryotes. Already basic RNA-seq protocols have been improved and adapted to looking at particular aspects of RNA biology, often with an emphasis on non-coding RNAs, and further refinements to current techniques will improve our understanding of gene expression, and genome content, in the future.
Chlamydia trachomatis is a major cause of bacterial sexually transmitted infections worldwide. In 2006, a new variant of C. trachomatis (nvCT), carrying a 377 bp deletion within the plasmid, was reported in Sweden. This deletion included the targets used by the commercial diagnostic systems from Roche and Abbott. The nvCT is clonal (serovar/genovar E) and it spread rapidly in Sweden, undiagnosed by these systems. The degree of spread may also indicate an increased biological fitness of nvCT. The aims of this study were to describe the genome of nvCT, to compare the nvCT genome to all available C. trachomatis genome sequences and to investigate the biological properties of nvCT. An early nvCT isolate (Sweden2) was analysed by genome sequencing, growth kinetics, microscopy, cell tropism assay and antimicrobial susceptibility testing. It was compared with relevant C. trachomatis isolates, including a similar serovar E C. trachomatis wild-type strain that circulated in Sweden prior to the initially undetected expansion of nvCT. The nvCT genome does not contain any major genetic polymorphisms – the genes for central metabolism, development cycle and virulence are conserved – or phenotypic characteristics that indicate any altered biological fitness. This is supported by the observations that the nvCT and wild-type C. trachomatis infections are very similar in terms of epidemiological distribution, and that differences in clinical signs are only described, in one study, in women. In conclusion, the nvCT does not appear to have any altered biological fitness. Therefore, the rapid transmission of nvCT in Sweden was due to the strong diagnostic selective advantage and its introduction into a high-frequency transmitting population.
Rhs genes are prominent features of bacterial genomes that have previously been implicated in genomic rearrangements in E. coli. By comparing rhs repertoires across the Enterobacteriaceae, this study provides a robust explanation of rhs diversification and evolution, and a mechanistic model of how rhs diversity is gained and lost.
Rhs genes are ubiquitous and comprise six structurally distinct lineages within the Enterobacteriaceae. There is considerable intergenomic variation in rhs repertoire; for instance, in Salmonella enterica, rhs are restricted to mobile elements, while in Escherichia coli one rhs lineage has diversified through transposition as older lineages have been deleted. Overall, comparative genomics reveals frequent, independent gene gains and losses, as well as occasional lateral gene transfer, in different genera. Furthermore, we demonstrate that Rhs 'core' domains and variable C-termini are evolutionarily decoupled, and propose that rhs diversity is driven by homologous recombination with circular intermediates. Existing C-termini are displaced by laterally acquired alternatives, creating long arrays of dissociated 'tips' that characterize the appearance of rhs loci.
Rhs repertoires are highly dynamic among Enterobacterial genomes, due to repeated gene gains and losses. In contrast, the primary structures of Rhs genes are evolutionarily conserved, indicating that rhs sequence diversity is driven, not by rapid mutation, but by the relatively slow evolution of novel core/tip combinations. Hence, we predict that a large pool of dissociated rhs C-terminal tips exists episomally and these are potentially transmitted across taxonomic boundaries.
Salmonella enterica serovar Enteritidis (S. Enteritidis) has caused major epidemics of gastrointestinal infection in many different countries. In this study we investigate genome divergence and pathogenic potential in S. Enteritidis isolated before, during and after an epidemic in Uruguay.
266 S. Enteritidis isolates were genotyped using RAPD-PCR and a selection were subjected to PFGE analysis. From these, 29 isolates spanning different periods, genetic profiles and sources of isolation were assayed for their ability to infect human epithelial cells and subjected to comparative genomic hybridization using a Salmonella pan-array and the sequenced strain S. Enteritidis PT4 P125109 as reference. Six other isolates from distant countries were included as external comparators.
Two hundred and thirty three chromosomal genes as well as the virulence plasmid were found as variable among S. Enteritidis isolates. Ten out of the 16 chromosomal regions that varied between different isolates correspond to phage-like regions. The 2 oldest pre-epidemic isolates lack phage SE20 and harbour other phage encoded genes that are absent in the sequenced strain. Besides variation in prophage, we found variation in genes involved in metabolism and bacterial fitness. Five epidemic strains lack the complete Salmonella virulence plasmid. Significantly, strains with indistinguishable genetic patterns still showed major differences in their ability to infect epithelial cells, indicating that the approach used was insufficient to detect the genetic basis of this differential behaviour.
The recent epidemic of S. Enteritidis infection in Uruguay has been driven by the introduction of closely related strains of phage type 4 lineage. Our results confirm previous reports demonstrating a high degree of genetic homogeneity among S. Enteritidis isolates. However, 10 of the regions of variability described here are for the first time reported as being variable in S. Enteritidis. In particular, the oldest pre-epidemic isolates carry phage-associated genetic regions not previously reported in S. Enteritidis. Overall, our results support the view that phages play a crucial role in the generation of genetic diversity in S. Enteritidis and that phage SE20 may be a key marker for the emergence of particular isolates capable of causing epidemics.
Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.
High-throughput sequencing of cDNA has been used to study eukaryotic transcription on a genome-wide scale to single base pair resolution. In order to compensate for the high ribonuclease activity in bacterial cells, we have devised an equivalent technique optimized for studying complete prokaryotic transcriptomes that minimizes the manipulation of the RNA sample. This new approach uses Illumina technology to sequence single-stranded (ss) cDNA, generating information on both the direction and level of transcription throughout the genome. The protocol, and associated data analysis programs, are freely available from http://www.sanger.ac.uk/Projects/Pathogens/Transcriptome/. We have successfully applied this method to the bacterial pathogens Salmonella bongori and Streptococcus pneumoniae and the yeast Schizosaccharomyces pombe. This method enables experimental validation of genetic features predicted in silico and allows the easy identification of novel transcripts throughout the genome. We also show that there is a high correlation between the level of gene expression calculated from ss-cDNA and double-stranded-cDNA sequencing, indicting that ss-cDNA sequencing is both robust and appropriate for use in quantitative studies of transcription. Hence, this simple method should prove a useful tool in aiding genome annotation and gene expression studies in both prokaryotes and eukaryotes.
A global collection of plasmids of the IncHI1 incompatibility group from Salmonella enterica serovar Typhi were analyzed by using a combination of DNA sequencing, DNA sequence analysis, PCR, and microarrays. The IncHI1 resistance plasmids of serovar Typhi display a backbone of conserved gene content and arrangement, within which are embedded preferred acquisition sites for horizontal DNA transfer events. The variable regions appear to be preferred acquisition sites for DNA, most likely through composite transposition, which is presumably driven by the acquisition of resistance genes. Plasmid multilocus sequence typing, a molecular typing method for IncHI1 plasmids, was developed using variation in six conserved loci to trace the spread of these plasmids and to elucidate their evolutionary relationships. The application of this method to a collection of 36 IncHI1 plasmids revealed a chronological clustering of plasmids despite their difference in geographical origins. Our findings suggest that the predominant plasmid types present after 1993 have not evolved directly from the earlier predominant plasmid type but have displaced them. We propose that antibiotic selection acts to maintain resistance genes on the plasmid, but there is also competition between plasmids encoding the same resistance phenotype.
Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates. B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.
Enteropathogenic Escherichia coli (EPEC) was the first pathovar of E. coli to be implicated in human disease; however, no EPEC strain has been fully sequenced until now. Strain E2348/69 (serotype O127:H6 belonging to E. coli phylogroup B2) has been used worldwide as a prototype strain to study EPEC biology, genetics, and virulence. Studies of E2348/69 led to the discovery of the locus of enterocyte effacement-encoded type III secretion system (T3SS) and its cognate effectors, which play a vital role in attaching and effacing lesion formation on gut epithelial cells. In this study, we determined the complete genomic sequence of E2348/69 and performed genomic comparisons with other important E. coli strains. We identified 424 E2348/69-specific genes, most of which are carried on mobile genetic elements, and a number of genetic traits specifically conserved in phylogroup B2 strains irrespective of their pathotypes, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups. The genome analysis revealed the entire gene repertoire related to E2348/69 virulence. Interestingly, E2348/69 contains only 21 intact T3SS effector genes, all of which are carried on prophages and integrative elements, compared to over 50 effector genes in enterohemorrhagic E. coli O157. As E2348/69 is the most-studied pathogenic E. coli strain, this study provides a genomic context for the vast amount of existing experimental data. The unexpected simplicity of the E2348/69 T3SS provides the first opportunity to fully dissect the entire virulence strategy of attaching and effacing pathogens in the genomic context.