We report a closed genome of Salmonella enterica subsp. enterica serovar Javiana (S. Javiana). This serotype is a common food-borne pathogen and is often associated with fresh-cut produce. Complete (finished) genome assemblies will support pilot studies testing the utility of next-generation sequencing (NGS) technologies in public health laboratories.
Within the last several years, Salmonella enterica subsp. enterica serovar Agona has been among the 20 most frequently isolated serovars in clinical cases of salmonellosis. In this report, the complete genome sequence of S. Agona strain 460004 2-1 isolated from unsweetened puffed-rice cereal during a multistate outbreak in 2008 was sequenced using single-molecule real-time DNA sequencing.
We report the genome sequence of Salmonella enterica subsp. enterica serovar Give (CFSAN012622), isolated from imported chili powder in 2014. This genome contains genes previously reported to be specific only to S. enterica serovar Enteritidis. This strain shows a unique pulsed-field gel electrophoresis (PFGE) pattern clustering with serovar Enteritidis (JEG X01.0005).
Virginia is the third largest producer of fresh-market tomatoes in the United States. Tomatoes grown along the eastern shore of Virginia are implicated almost yearly in Salmonella illnesses. Traceback implicates contamination occurring in the pre-harvest environment. To get a better understanding of the ecological niches of Salmonella in the tomato agricultural environment, a 2-year study was undertaken at a regional agricultural research farm in Virginia. Environmental samples, including tomato (fruit, blossoms, and leaves), irrigation water, surface water and sediment, were collected over the growing season. These samples were analyzed for the presence of Salmonella using modified FDA-BAM methods. Molecular assays were used to screen the samples. Over 1500 samples were tested. Seventy-five samples tested positive for Salmonella yielding over 230 isolates. The most commonly isolated serovars were S. Newport and S. Javiana with pulsed-field gel electrophoresis yielding 39 different patterns. Genetic diversity was further underscored among many other serotypes, which showed multiple PFGE subtypes. Whole genome sequencing (WGS) of several S. Newport isolates collected in 2010 compared to clinical isolates associated with tomato consumption showed very few single nucleotide differences between environmental isolates and clinical isolates suggesting a source link to Salmonella contaminated tomatoes. Nearly all isolates collected during two growing seasons of surveillance were obtained from surface water and sediment sources pointing to these sites as long-term reservoirs for persistent and endemic contamination of this environment.
Salmonella Newport; tomatoes; environmental reservoirs; epidemiological impact; prevalence and diversity
Salmonella enterica subsp. enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. Recently, an antibiotic-resistant strain of this serovar was implicated in a large 2011 multistate outbreak resulting from consumption of contaminated ground turkey that involved 136 confirmed cases, with one death. In this study, we assessed the evolutionary diversity of 44 S. Heidelberg isolates using whole-genome sequencing (WGS) generated by the 454 GS FLX (Roche) platform. The isolates, including 30 with nearly indistinguishable (one band difference) Xbal pulsed-field gel electrophoresis patterns (JF6X01.0032, JF6X01.0058), were collected from various sources between 1982 and 2011 and included nine isolates associated with the 2011 outbreak. Additionally, we determined the complete sequence for the chromosome and three plasmids from a clinical isolate associated with the 2011 outbreak using the Pacific Biosciences (PacBio) system. Using single-nucleotide polymorphism (SNP) analyses, we were able to distinguish highly clonal isolates, including strains isolated at different times in the same year. The isolates from the recent 2011 outbreak clustered together with a mean SNP variation of only 17 SNPs. The S. Heidelberg isolates carried a variety of phages, such as prophage P22, P4, lambda-like prophage Gifsy-2, and the P2-like phage which carries the sopE1 gene, virulence genes including 62 pathogenicity, and 13 fimbrial markers and resistance plasmids of the incompatibility (Inc)I1, IncA/C, and IncHI2 groups. Twenty-one strains contained an IncX plasmid carrying a type IV secretion system. On the basis of the recent and historical isolates used in this study, our results demonstrated that, in addition to providing detailed genetic information for the isolates, WGS can identify SNP targets that can be utilized for differentiating highly clonal S. Heidelberg isolates.
outbreak; antimicrobial resistance; plasmid; SNP analysis; trace-back
The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT) DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A) methylated motifs, one N4-methylcytosine (m4C) motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases) associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems), indicating the high diversity of methylation patterns present in Salmonella.
Multidrug-resistant (MDR) Salmonella enterica subsp. enterica serotype Newport has been a long-standing public health concern in the United States. We present the complete sequences of six IncA/C plasmids from animal-derived MDR S. Newport ranging from 80.1 to 158.5 kb. They shared a genetic backbone with S. Newport IncA/C plasmids pSN254 and pAM04528.
Vibrio parahaemolyticus is the leading cause of foodborne illnesses in the US associated with the consumption of raw shellfish. Previous population studies of V. parahaemolyticus have used Multi-Locus Sequence Typing (MLST) or Pulsed Field Gel Electrophoresis (PFGE). Whole genome sequencing (WGS) provides a much higher level of resolution, but has been used to characterize only a few United States (US) clinical isolates. Here we report the WGS characterization of 34 genomes of V. parahaemolyticus strains that were isolated from clinical cases in the state of Maryland (MD) during 2 years (2012–2013). These 2 years saw an increase of V. parahaemolyticus cases compared to previous years. Among these MD isolates, 28% were negative for tdh and trh, 8% were tdh positive only, 11% were trh positive only, and 53% contained both genes. We compared this set of V. parahaemolyticus genomes to those of a collection of 17 archival strains from the US (10 previously sequenced strains and 7 from NCBI, collected between 1988 and 2004) and 15 international strains, isolated from geographically-diverse environmental and clinical sources (collected between 1980 and 2010). A WGS phylogenetic analysis of these strains revealed the regional outbreak strains from MD are highly diverse and yet genetically distinct from the international strains. Some MD strains caused outbreaks 2 years in a row, indicating a local source of contamination (e.g., ST631). Advances in WGS will enable this type of analysis to become routine, providing an excellent tool for improved surveillance. Databases built with phylogenetic data will help pinpoint sources of contamination in future outbreaks and contribute to faster outbreak control.
NGS; WGS; Vibrio parahaemolyticus; clinical; phylogenetic analysis; phylogeny; SNPs
Shiga toxin-producing Escherichia coli (STEC) O26 is the second leading E. coli serogroup responsible for human illness outbreaks behind E. coli O157:H7. Recent outbreaks have been linked to emerging pathogenic O26:H11 strains harboring stx2 only. Cattle have been recognized as an important reservoir of O26 strains harboring stx1; however the reservoir of these emerging stx2 strains is unknown. The objective of this study was to identify nucleotide polymorphisms in human and cattle-derived strains in order to compare differences in polymorphism derived genotypes and virulence gene profiles between the two host species. Whole genome sequencing was performed on 182 epidemiologically unrelated O26 strains, including 109 human-derived strains and 73 non-human-derived strains. A panel of 289 O26 strains (241 STEC and 48 non-STEC) was subsequently genotyped using a set of 283 polymorphisms identified by whole genome sequencing, resulting in 64 unique genotypes. Phylogenetic analyses identified seven clusters within the O26 strains. The seven clusters did not distinguish between isolates originating from humans or cattle; however, clusters did correspond with particular virulence gene profiles. Human and non-human-derived strains harboring stx1 clustered separately from strains harboring stx2, strains harboring eae, and non-STEC strains. Strains harboring stx2 were more closely related to non-STEC strains and strains harboring eae than to strains harboring stx1. The finding of human and cattle-derived strains with the same polymorphism derived genotypes and similar virulence gene profiles, provides evidence that similar strains are found in cattle and humans and transmission between the two species may occur.
Escherichia coli; O26; Shiga toxins; polymorphisms; phylogenetic
Phage typing has been used for the epidemiological surveillance of Salmonella enterica serovar Enteritidis for over 2 decades. However, knowledge of the genetic and evolutionary relationships between phage types is very limited, making differences difficult to interpret. Here, single nucleotide polymorphisms (SNPs) identified from whole-genome comparisons were used to determine the relationships between some S. Enteritidis phage types (PTs) commonly associated with food-borne outbreaks in the United States. Emphasis was placed on the predominant phage types PT8, PT13a, and PT13 in North America. With >89,400 bp surveyed across 98 S. Enteritidis isolates representing 14 distinct phage types, 55 informative SNPs were discovered within 23 chromosomally anchored loci. To maximize the discriminatory and evolutionary partitioning of these highly homogeneous strains, sequences comprising informative SNPs were concatenated into a single combined data matrix and subjected to phylogenetic analysis. The resultant phylogeny allocated most S. Enteritidis isolates into two distinct clades (clades I and II) and four subclades. Synapomorphic (shared and derived) sets of SNPs capable of distinguishing individual clades/subclades were identified. However, individual phage types appeared to be evolutionarily disjunct when mapped to this phylogeny, suggesting that phage typing may not be valid for making phylogenetic inferences. Furthermore, the set of SNPs identified here represents useful genetic markers for strain differentiation of more clonal S. Enteritidis strains and provides core genotypic markers for future development of a SNP typing scheme with S. Enteritidis.
Salmonella enterica subsp. enterica serovar Cubana (Salmonella serovar Cubana) is associated with human and animal disease. Here, we used third-generation, single-molecule, real-time DNA sequencing to determine the first complete genome sequence of Salmonella serovar Cubana CFSAN002050, which was isolated from fresh alfalfa sprouts during a multistate outbreak in 2012.
Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.
Salmonella; Outbreak; Congruence; Phylogenetics; Next generation sequencing; Single nucleotide polymorphism
For Salmonella enterica serovar Enteritidis, 85% of isolates can be classified into 5 pulsed-field gel electrophoresis (PFGE) types. However, PFGE has limited discriminatory power for outbreak detection. Although whole-genome sequencing has been found to improve discrimination of outbreak clusters, whether this procedure can be used in real-time in a public health laboratory is not known. Therefore, we conducted a retrospective and prospective analysis. The retrospective study investigated isolates from 1 confirmed outbreak. Additional cases could be attributed to the outbreak strain on the basis of whole-genome data. The prospective study included 58 isolates obtained in 2012, including isolates from 1 epidemiologically defined outbreak. Whole-genome sequencing identified additional isolates that could be attributed to the outbreak, but which differed from the outbreak-associated PFGE type. Additional putative outbreak clusters were detected in the retrospective and prospective analyses. This study demonstrates the practicality of implementing this approach for outbreak surveillance in a state public health laboratory.
Salmonella enterica serovar Enteritidis; bacteria; high-throughput nucleotide sequencing; whole-genome sequencing; pulsed-field gel electrophoresis; infectious disease outbreaks; public health laboratory surveillance
Next-generation sequencing is being evaluated for use with food-borne illness investigations, especially when the outbreak strains produce patterns that cannot be discriminated from non-outbreak strains using conventional procedures. Here we report complete genome assemblies of two Salmonella enterica serovar Heidelberg strains with a common pulsed-field gel electrophoresis pattern isolated during an outbreak investigation.
Evolutionary studies of clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (cas) genes can provide insights into host-pathogen co-evolutionary dynamics and the frequency at which different genomic events (e.g., horizontal vs. vertical transmission) occur. Within this study, we used whole genome sequence (WGS) data to determine the evolutionary history and genetic diversity of CRISPR loci and cas genes among a diverse set of 427 Salmonella enterica ssp. enterica isolates representing 64 different serovars. We also evaluated the performance of CRISPR loci for typing when compared to whole genome and multilocus sequence typing (MLST) approaches. We found that there was high diversity in array length within both CRISPR1 (median = 22; min = 3; max = 79) and CRISPR2 (median = 27; min = 2; max = 221). There was also much diversity within serovars (e.g., arrays differed by as many as 50 repeat-spacer units among Salmonella ser. Senftenberg isolates). Interestingly, we found that there are two general cas gene profiles that do not track phylogenetic relationships, which suggests that non-vertical transmission events have occurred frequently throughout the evolutionary history of the sampled isolates. There is also considerable variation among the ranges of pairwise distances estimated within each cas gene, which may be indicative of the strength of natural selection acting on those genes. We developed a novel clustering approach based on CRISPR spacer content, but found that typing based on CRISPRs was less accurate than the MLST-based alternative; typing based on WGS data was the most accurate. Notwithstanding cost and accessibility, we anticipate that draft genome sequencing, due to its greater discriminatory power, will eventually become routine for traceback investigations.
Salmonella; Horizontal gene transfer; Evolution; CRISPR; Outbreak; Phylogeny; Whole genome sequencing; Typing
Salmonella enterica subsp. enterica serovar Typhimurium is a leading cause of salmonellosis. Here, we report a closed genome sequence, including sequences of 3 plasmids, of Salmonella serovar Typhimurium var. 5− CFSAN001921 (National Antimicrobial Resistance Monitoring System [NARMS] strain ID N30688), which was isolated from chicken breast meat and shows resistance to 10 different antimicrobials. Whole-genome and plasmid sequence analyses of this isolate will help enhance our understanding of this pathogenic multidrug-resistant serovar.
Comparative methods for analyzing whole genome sequence (WGS) data enable us to assess the genetic information available for reconstructing the evolutionary history of pathogens. We used the comparative approach to determine diagnostic genes for Salmonella enterica subspecies I. S. enterica subsp. I strains are known to infect warm-blooded organisms regularly while its close relatives tend to infect only cold-blooded organisms. We found 71 genes gained by the common ancestor of Salmonella enterica subspecies I and not subsequently lost by any member of this subspecies sequenced to date. These genes included many putative functional phenotypes. Twenty-seven of these genes are found only in Salmonella enterica subspecies I; we designed primers to test these genes for use as diagnostic sequence targets and data mined the NCBI Sequence Read Archive (SRA) database for draft genomes which carried these genes. We found that the sequence specificity and variability of these amplicons can be used to detect and discriminate among 317 different serovars and strains of Salmonella enterica subspecies I.
The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms.
H antigens; serovar; O antigens; CRISPR; lineage-through-time plot; comparative method
Here, we report draft genomes of Paenibacillus alvei strains A6-6i and TS-15, which were isolated, respectively, from plant material and soil in the Virginia Eastern Shore (VES) tomato growing area. An array of genes related to antimicrobial biosynthetic pathways have been identified with whole-genome analyses of these strains.
An assay to identify the common food-borne pathogens Salmonella, Escherichia coli, Shigella, and Listeria monocytogenes was developed in collaboration with Ibis Biosciences (a division of Abbott Molecular) for the Plex-ID biosensor system, a platform that uses electrospray ionization mass spectroscopy (ESI-MS) to detect the base composition of short PCR amplicons. The new food-borne pathogen (FBP) plate has been experimentally designed using four gene segments for a total of eight amplicon targets. Initial work built a DNA base count database that contains more than 140 Salmonella enterica, 139 E. coli, 11 Shigella, and 36 Listeria patterns and 18 other Enterobacteriaceae organisms. This assay was tested to determine the scope of the assay's ability to detect and differentiate the enteric pathogens and to improve the reference database associated with the assay. More than 800 bacterial isolates of S. enterica, E. coli, and Shigella species were analyzed. Overall, 100% of S. enterica, 99% of E. coli, and 73% of Shigella spp. were detected using this assay. The assay was also able to identify 30% of the S. enterica serovars to the serovar level. To further characterize the assay, spiked food matrices and food samples collected during regulatory field work were also studied. While analysis of preenrichment media was inconsistent, identification of S. enterica from selective enrichment media resulted in serovar-level identifications for 8 of 10 regulatory samples. The results of this study suggest that this high-throughput method may be useful in clinical and regulatory laboratories testing for these pathogens.
Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.
Facile laboratory tools are needed to augment identification in contamination events to trace the contamination back to the source (traceback) of Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis). Understanding the evolution and diversity within and among outbreak strains is the first step towards this goal. To this end, we collected 106 new S. Enteriditis isolates within S. Enteriditis Pulsed-Field Gel Electrophoresis (PFGE) pattern JEGX01.0004 and close relatives, and determined their genome sequences. Sources for these isolates spanned food, clinical and environmental farm sources collected during the 2010 S. Enteritidis shell egg outbreak in the United States along with closely related serovars, S. Dublin, S. Gallinarum biovar Pullorum and S. Gallinarum. Despite the highly homogeneous structure of this population, S. Enteritidis isolates examined in this study revealed thousands of SNP differences and numerous variable genes (n = 366). Twenty-one of these genes from the lineages leading to outbreak-associated samples had nonsynonymous (causing amino acid changes) changes and five genes are putatively involved in known Salmonella virulence pathways. While chromosome synteny and genome organization appeared to be stable among these isolates, genome size differences were observed due to variation in the presence or absence of several phages and plasmids, including phage RE-2010, phage P125109, plasmid pSEEE3072_19 (similar to pSENV), plasmid pOU1114 and two newly observed mobile plasmid elements pSEEE1729_15 and pSEEE0956_35. These differences produced modifications to the assembled bases for these draft genomes in the size range of approximately 4.6 to 4.8 mbp, with S. Dublin being larger (∼4.9 mbp) and S. Gallinarum smaller (4.55 mbp) when compared to S. Enteritidis. Finally, we identified variable S. Enteritidis genes associated with virulence pathways that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future outbreaks involving S. Enteritidis PFGE pattern JEGX01.0004.
Salmonella enterica is recognized as one of the most common bacterial agents of foodborne illness. We report draft genomes of four Salmonella serovar Heidelberg isolates associated with the recent multistate outbreak of human Salmonella Heidelberg infections linked to kosher broiled chicken livers in the United States in 2011. Isolates 2011K-1259 and 2011K-1232 were recovered from humans, whereas 2011K-1724 and 2011K-1726 were isolated from chicken liver. Whole genome sequence analysis of these isolates provides a tool for studying the short-term evolution of these epidemic clones and can be used for characterizing potentially new virulence factors.