Search tips
Search criteria 


Logo of genannJournal InfoAuthorsPermissionsJournals.ASM.orggenomeA ArticleGenome Announcements
Genome Announc. 2017 December; 5(50): e01321-17.
Published online 2017 December 14. doi:  10.1128/genomeA.01321-17
PMCID: PMC5730671

Species-Wide Collection of Escherichia coli Isolates for Examination of Genomic Diversity


Pathogenic and nonpathogenic Escherichia coli strains present a vast genomic diversity. We report the genome sequences of 2,244 E. coli isolates from multiple animal and environmental sources. Their phylogenetic relationships and potential risk to human health were examined.


Most Escherichia coli strains are harmless commensals that are found as part of the gut microbiota. Some strains, however, have the ability to cause disease and are considered pathogenic. Phylogenetic analyses have shown that E. coli strains can be divided into several phylogroups (1, 2), with pathogenic and nonpathogenic strains randomly distributed among them. To facilitate the study of the genomic diversity of this species, we sequenced a collection of 2,244 E. coli isolates from multiple mammalian (including human), avian, and environmental sources. This diverse collection contains nonpathogenic strains as well as several different pathogenic types (i.e., pathotypes), including attaching and effacing (AEEC), enteroaggregative (EAEC), enteroinvasive (EIEC), enterotoxigenic (ETEC), and Shiga toxin-producing (STEC) E. coli (3). Pathotypes are often defined by differing sets of virulence-associated genes. Many of these genes are carried on mobile genetic elements that can be transferred among strains, resulting in new combinations and several hybrid pathotypes such as STEC/EAEC (4), STEC/ETEC (5,7), and AEEC/ETEC (8). In this report, we characterized 2,244 E. coli isolates based on phylogenetic relationships and their potential risk to human health. The information reported here will help to better understand the evolution of these emergent foodborne pathogens and improve the accuracy of trace-back investigations during outbreaks caused by them.

Pure cultures for each strain were grown aerobically overnight in Luria-Bertani broth at 37°C. Total genomic DNA was extracted from 1 ml of overnight culture using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany). DNA extractions were performed with the Qiagen QIAcube instrument using the manufacturer’s Gram-negative bacteria protocol. Sequencing libraries were prepared with 1 ng of DNA using the Nextera XT DNA sample prep kit (Illumina, San Diego, CA, USA) and sequenced on either the Illumina MiSeq or NextSeq platform. The resulting paired-end reads were quality controlled using FastQC (Q > 30) and de novo assembled using SPAdes 3.8.2 (9) or CLC Genomics Workbench 8.2.1 (CLC bio, Aarhus, Denmark).

Depth of coverage for the draft genomes ranged from 20× to 200× with the genome sizes ranging from 4,412,939 to 5,984,698 bp. The number of contigs ranged from 39 to 1,110, while the N50 values ranged from 14,741 to 699,676 bp. Each of the established E. coli phylogroups is represented in the sequenced strain collection as follows: A, 23%; B1, 47%; B2, 13%; D, 6%; E, 9%; and F, 2%. The strains were also screened for the presence of known or putative virulence factors, such as aggR, eae, ipaH, LT, ST, stx1, and stx2. Out of the 2,244 isolates, 394 can be classified as AEEC, 23 as EAEC, 9 as EIEC, 134 as ETEC, and 402 as STEC. Several strains were found to possess factors associated with hybrid pathotypes: STEC/ETEC (n = 22), AEEC/ETEC (n = 2), and STEC/EAEC (n = 1).

Accession number(s).

The draft genome assemblies were deposited in DDBJ/ENA/GenBank through FDA’s GenomeTrakr pipeline under BioProject PRJNA230969 with accession numbers NJIZ00000000 to NJNL00000000, NJRR00000000 to NKAI00000000, NKDC00000000 to NKEV00000000, NKLT00000000 to NKPS00000000, NKUK00000000 to NKVR00000000, NLFN00000000 to NMOH00000000, NNSX00000000 to NOIE00000000, NOMB00000000 to NOUU00000000, NOWO00000000 to NOWP00000000, NTND00000000 to NTPX00000000, NVPS00000000, NWNA00000000 to NWQF00000000, and NXMG00000000 to NXNF00000000. The versions described in this announcement are the first versions.


The views expressed in this article are those of the authors and do not necessarily reflect the official policy of the Department of Health and Human Services, the U.S. Food and Drug Administration (FDA), or the U.S. Government. Reference to any commercial materials, equipment, or process does not in any way constitute approval, endorsement, or recommendation by the FDA.

Part of this study was supported by the ORISE Fellowship Program.


Citation Gangiredla J, Mammel MK, Barnaba TJ, Tartera C, Gebru ST, Patel IR, Leonard SR, Kotewicz ML, Lampel KA, Elkins CA, Lacher DW. 2017. Species-wide collection of Escherichia coli isolates for examination of genomic diversity. Genome Announc 5:e01321-17.


1. Herzer PJ, Inouye S, Inouye M, Whittam TS 1990. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J Bacteriol 172:6175–6181. doi:.10.1128/jb.172.11.6175-6181.1990 [PMC free article] [PubMed] [Cross Ref]
2. Clermont O, Christenson JK, Denamur E, Gordon DM 2013. The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups. Environ Microbiol Rep 5:58–65. doi:.10.1111/1758-2229.12019 [PubMed] [Cross Ref]
3. Kaper JB, Nataro JP, Mobley HL 2004. Pathogenic Escherichia coli. Nat Rev Microbiol 2:123–140. doi:.10.1038/nrmicro818 [PubMed] [Cross Ref]
4. Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin CS, Iliopoulos D, Klammer A, Peluso P, Lee L, Kislyuk AO, Bullard J, Kasarskis A, Wang S, Eid J, Rank D, Redman JC, Steyert SR, Frimodt-Møller J, Struve C, Petersen AM, Krogfelt KA, Nataro JP, Schadt EE, Waldor MK 2011. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med 365:709–717. doi:.10.1056/NEJMoa1106920 [PMC free article] [PubMed] [Cross Ref]
5. Monday SR, Keys C, Hanson P, Shen Y, Whittam TS, Feng P 2006. Produce isolates of the Escherichia coli Ont:H52 serotype that carry both Shiga toxin 1 and stable toxin genes. Appl Environ Microbiol 72:3062–3065. doi:.10.1128/AEM.72.4.3062-3065.2006 [PMC free article] [PubMed] [Cross Ref]
6. Nyholm O, Halkilahti J, Wiklund G, Okeke U, Paulin L, Auvinen P, Haukka K, Siitonen A 2015. Comparative genomics and characterization of hybrid Shigatoxigenic and enterotoxigenic Escherichia coli (STEC/ETEC) strains. PLoS One 10:e0135936. doi:.10.1371/journal.pone.0135936 [PMC free article] [PubMed] [Cross Ref]
7. Leonard SR, Mammel MK, Rasko DA, Lacher DW 2016. Hybrid Shiga toxin-producing and enterotoxigenic Escherichia sp. cryptic lineage 1 strain 7v harbors a hybrid plasmid. Appl Environ Microbiol 82:4309–4319. [PMC free article] [PubMed]
8. Dutta S, Pazhani GP, Nataro JP, Ramamurthy T 2015. Heterogenic virulence in a diarrheagenic Escherichia coli: evidence for an EPEC expressing heat-labile toxin of ETEC. Int J Med Microbiol 305:47–54. doi:.10.1016/j.ijmm.2014.10.006 [PubMed] [Cross Ref]
9. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi:.10.1089/cmb.2012.0021 [PMC free article] [PubMed] [Cross Ref]

Articles from Genome Announcements are provided here courtesy of American Society for Microbiology (ASM)