We report a closed genome of Salmonella enterica subsp. enterica serovar Javiana (S. Javiana). This serotype is a common food-borne pathogen and is often associated with fresh-cut produce. Complete (finished) genome assemblies will support pilot studies testing the utility of next-generation sequencing (NGS) technologies in public health laboratories.
Phage typing has been used for the epidemiological surveillance of Salmonella enterica serovar Enteritidis for over 2 decades. However, knowledge of the genetic and evolutionary relationships between phage types is very limited, making differences difficult to interpret. Here, single nucleotide polymorphisms (SNPs) identified from whole-genome comparisons were used to determine the relationships between some S. Enteritidis phage types (PTs) commonly associated with food-borne outbreaks in the United States. Emphasis was placed on the predominant phage types PT8, PT13a, and PT13 in North America. With >89,400 bp surveyed across 98 S. Enteritidis isolates representing 14 distinct phage types, 55 informative SNPs were discovered within 23 chromosomally anchored loci. To maximize the discriminatory and evolutionary partitioning of these highly homogeneous strains, sequences comprising informative SNPs were concatenated into a single combined data matrix and subjected to phylogenetic analysis. The resultant phylogeny allocated most S. Enteritidis isolates into two distinct clades (clades I and II) and four subclades. Synapomorphic (shared and derived) sets of SNPs capable of distinguishing individual clades/subclades were identified. However, individual phage types appeared to be evolutionarily disjunct when mapped to this phylogeny, suggesting that phage typing may not be valid for making phylogenetic inferences. Furthermore, the set of SNPs identified here represents useful genetic markers for strain differentiation of more clonal S. Enteritidis strains and provides core genotypic markers for future development of a SNP typing scheme with S. Enteritidis.
Salmonella enterica subsp. enterica serovar Cubana (Salmonella serovar Cubana) is associated with human and animal disease. Here, we used third-generation, single-molecule, real-time DNA sequencing to determine the first complete genome sequence of Salmonella serovar Cubana CFSAN002050, which was isolated from fresh alfalfa sprouts during a multistate outbreak in 2012.
Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.
Salmonella; Outbreak; Congruence; Phylogenetics; Next generation sequencing; Single nucleotide polymorphism
For Salmonella enterica serovar Enteritidis, 85% of isolates can be classified into 5 pulsed-field gel electrophoresis (PFGE) types. However, PFGE has limited discriminatory power for outbreak detection. Although whole-genome sequencing has been found to improve discrimination of outbreak clusters, whether this procedure can be used in real-time in a public health laboratory is not known. Therefore, we conducted a retrospective and prospective analysis. The retrospective study investigated isolates from 1 confirmed outbreak. Additional cases could be attributed to the outbreak strain on the basis of whole-genome data. The prospective study included 58 isolates obtained in 2012, including isolates from 1 epidemiologically defined outbreak. Whole-genome sequencing identified additional isolates that could be attributed to the outbreak, but which differed from the outbreak-associated PFGE type. Additional putative outbreak clusters were detected in the retrospective and prospective analyses. This study demonstrates the practicality of implementing this approach for outbreak surveillance in a state public health laboratory.
Salmonella enterica serovar Enteritidis; bacteria; high-throughput nucleotide sequencing; whole-genome sequencing; pulsed-field gel electrophoresis; infectious disease outbreaks; public health laboratory surveillance
Next-generation sequencing is being evaluated for use with food-borne illness investigations, especially when the outbreak strains produce patterns that cannot be discriminated from non-outbreak strains using conventional procedures. Here we report complete genome assemblies of two Salmonella enterica serovar Heidelberg strains with a common pulsed-field gel electrophoresis pattern isolated during an outbreak investigation.
Evolutionary studies of clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (cas) genes can provide insights into host-pathogen co-evolutionary dynamics and the frequency at which different genomic events (e.g., horizontal vs. vertical transmission) occur. Within this study, we used whole genome sequence (WGS) data to determine the evolutionary history and genetic diversity of CRISPR loci and cas genes among a diverse set of 427 Salmonella enterica ssp. enterica isolates representing 64 different serovars. We also evaluated the performance of CRISPR loci for typing when compared to whole genome and multilocus sequence typing (MLST) approaches. We found that there was high diversity in array length within both CRISPR1 (median = 22; min = 3; max = 79) and CRISPR2 (median = 27; min = 2; max = 221). There was also much diversity within serovars (e.g., arrays differed by as many as 50 repeat-spacer units among Salmonella ser. Senftenberg isolates). Interestingly, we found that there are two general cas gene profiles that do not track phylogenetic relationships, which suggests that non-vertical transmission events have occurred frequently throughout the evolutionary history of the sampled isolates. There is also considerable variation among the ranges of pairwise distances estimated within each cas gene, which may be indicative of the strength of natural selection acting on those genes. We developed a novel clustering approach based on CRISPR spacer content, but found that typing based on CRISPRs was less accurate than the MLST-based alternative; typing based on WGS data was the most accurate. Notwithstanding cost and accessibility, we anticipate that draft genome sequencing, due to its greater discriminatory power, will eventually become routine for traceback investigations.
Salmonella; Horizontal gene transfer; Evolution; CRISPR; Outbreak; Phylogeny; Whole genome sequencing; Typing
Salmonella enterica subsp. enterica serovar Typhimurium is a leading cause of salmonellosis. Here, we report a closed genome sequence, including sequences of 3 plasmids, of Salmonella serovar Typhimurium var. 5− CFSAN001921 (National Antimicrobial Resistance Monitoring System [NARMS] strain ID N30688), which was isolated from chicken breast meat and shows resistance to 10 different antimicrobials. Whole-genome and plasmid sequence analyses of this isolate will help enhance our understanding of this pathogenic multidrug-resistant serovar.
Comparative methods for analyzing whole genome sequence (WGS) data enable us to assess the genetic information available for reconstructing the evolutionary history of pathogens. We used the comparative approach to determine diagnostic genes for Salmonella enterica subspecies I. S. enterica subsp. I strains are known to infect warm-blooded organisms regularly while its close relatives tend to infect only cold-blooded organisms. We found 71 genes gained by the common ancestor of Salmonella enterica subspecies I and not subsequently lost by any member of this subspecies sequenced to date. These genes included many putative functional phenotypes. Twenty-seven of these genes are found only in Salmonella enterica subspecies I; we designed primers to test these genes for use as diagnostic sequence targets and data mined the NCBI Sequence Read Archive (SRA) database for draft genomes which carried these genes. We found that the sequence specificity and variability of these amplicons can be used to detect and discriminate among 317 different serovars and strains of Salmonella enterica subspecies I.
The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms.
H antigens; serovar; O antigens; CRISPR; lineage-through-time plot; comparative method
Here, we report draft genomes of Paenibacillus alvei strains A6-6i and TS-15, which were isolated, respectively, from plant material and soil in the Virginia Eastern Shore (VES) tomato growing area. An array of genes related to antimicrobial biosynthetic pathways have been identified with whole-genome analyses of these strains.
An assay to identify the common food-borne pathogens Salmonella, Escherichia coli, Shigella, and Listeria monocytogenes was developed in collaboration with Ibis Biosciences (a division of Abbott Molecular) for the Plex-ID biosensor system, a platform that uses electrospray ionization mass spectroscopy (ESI-MS) to detect the base composition of short PCR amplicons. The new food-borne pathogen (FBP) plate has been experimentally designed using four gene segments for a total of eight amplicon targets. Initial work built a DNA base count database that contains more than 140 Salmonella enterica, 139 E. coli, 11 Shigella, and 36 Listeria patterns and 18 other Enterobacteriaceae organisms. This assay was tested to determine the scope of the assay's ability to detect and differentiate the enteric pathogens and to improve the reference database associated with the assay. More than 800 bacterial isolates of S. enterica, E. coli, and Shigella species were analyzed. Overall, 100% of S. enterica, 99% of E. coli, and 73% of Shigella spp. were detected using this assay. The assay was also able to identify 30% of the S. enterica serovars to the serovar level. To further characterize the assay, spiked food matrices and food samples collected during regulatory field work were also studied. While analysis of preenrichment media was inconsistent, identification of S. enterica from selective enrichment media resulted in serovar-level identifications for 8 of 10 regulatory samples. The results of this study suggest that this high-throughput method may be useful in clinical and regulatory laboratories testing for these pathogens.
Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.
Facile laboratory tools are needed to augment identification in contamination events to trace the contamination back to the source (traceback) of Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis). Understanding the evolution and diversity within and among outbreak strains is the first step towards this goal. To this end, we collected 106 new S. Enteriditis isolates within S. Enteriditis Pulsed-Field Gel Electrophoresis (PFGE) pattern JEGX01.0004 and close relatives, and determined their genome sequences. Sources for these isolates spanned food, clinical and environmental farm sources collected during the 2010 S. Enteritidis shell egg outbreak in the United States along with closely related serovars, S. Dublin, S. Gallinarum biovar Pullorum and S. Gallinarum. Despite the highly homogeneous structure of this population, S. Enteritidis isolates examined in this study revealed thousands of SNP differences and numerous variable genes (n = 366). Twenty-one of these genes from the lineages leading to outbreak-associated samples had nonsynonymous (causing amino acid changes) changes and five genes are putatively involved in known Salmonella virulence pathways. While chromosome synteny and genome organization appeared to be stable among these isolates, genome size differences were observed due to variation in the presence or absence of several phages and plasmids, including phage RE-2010, phage P125109, plasmid pSEEE3072_19 (similar to pSENV), plasmid pOU1114 and two newly observed mobile plasmid elements pSEEE1729_15 and pSEEE0956_35. These differences produced modifications to the assembled bases for these draft genomes in the size range of approximately 4.6 to 4.8 mbp, with S. Dublin being larger (∼4.9 mbp) and S. Gallinarum smaller (4.55 mbp) when compared to S. Enteritidis. Finally, we identified variable S. Enteritidis genes associated with virulence pathways that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future outbreaks involving S. Enteritidis PFGE pattern JEGX01.0004.
Salmonella enterica is recognized as one of the most common bacterial agents of foodborne illness. We report draft genomes of four Salmonella serovar Heidelberg isolates associated with the recent multistate outbreak of human Salmonella Heidelberg infections linked to kosher broiled chicken livers in the United States in 2011. Isolates 2011K-1259 and 2011K-1232 were recovered from humans, whereas 2011K-1724 and 2011K-1726 were isolated from chicken liver. Whole genome sequence analysis of these isolates provides a tool for studying the short-term evolution of these epidemic clones and can be used for characterizing potentially new virulence factors.
Salmonella enterica serovar Heidelberg has caused numerous outbreaks in humans. Here, we report draft genomes of five isolates of serovar Heidelberg associated with the recent (2011) multistate outbreak linked to ground turkey in the United States. Isolates 2011K-1110 and 2011K-1132 were recovered from humans, while isolates 2011K-1138, 2011K-1224, and 2011K-1225 were recovered from ground turkey. Whole-genome sequence analysis of these isolates provides a tool for studying the short-term evolution of these epidemic clones.
Cheese contamination can occur at numerous stages in the manufacturing process including the use of improperly pasteurized or raw milk. Of concern is the potential contamination by Listeria monocytogenes and other pathogenic bacteria that find the high moisture levels and moderate pH of popular Latin-style cheeses like queso fresco a hospitable environment. In the investigation of a foodborne outbreak, samples typically undergo enrichment in broth for 24 hours followed by selective agar plating to isolate bacterial colonies for confirmatory testing. The broth enrichment step may also enable background microflora to proliferate, which can confound subsequent analysis if not inhibited by effective broth or agar additives. We used 16S rRNA gene sequencing to provide a preliminary survey of bacterial species associated with three brands of Latin-style cheeses after 24-hour broth enrichment.
Brand A showed a greater diversity than the other two cheese brands (Brands B and C) at nearly every taxonomic level except phylum. Brand B showed the least diversity and was dominated by a single bacterial taxon, Exiguobacterium, not previously reported in cheese. This genus was also found in Brand C, although Lactococcus was prominent, an expected finding since this bacteria belongs to the group of lactic acid bacteria (LAB) commonly found in fermented foods.
The contrasting diversity observed in Latin-style cheese was surprising, demonstrating that despite similarity of cheese type, raw materials and cheese making conditions appear to play a critical role in the microflora composition of the final product. The high bacterial diversity associated with Brand A suggests it may have been prepared with raw materials of high bacterial diversity or influenced by the ecology of the processing environment. Additionally, the presence of Exiguobacterium in high proportions (96%) in Brand B and, to a lesser extent, Brand C (46%), may have been influenced by the enrichment process. This study is the first to define Latin-style cheese microflora using Next-Generation Sequencing. These valuable preliminary data will direct selective tailoring of agar formulations to improve culture-based detection of pathogens in Latin-style cheese.
Latin-style cheese; Next Generation Sequencing; Microflora; Bacteria; Exiguobacterium
Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.
carnivore; genome; SINE
Next-Generation Sequencing (NGS) is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study.
Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data.
Implementation of a validated pipeline for NGS data acquisition and analysis provides highly reproducible results that are stable and predictable for molecular epidemiological applications. When draft genomes are collected at 15×-20× coverage and passed through a quality filter as part of a data analysis pipeline, including sub-passaged replicates defined by a few SNPs, they can be accurately placed in a phylogenetic context. This reproducibility applies to all levels within and between serovars of Salmonella suggesting that investigators using these methods can have confidence in their conclusions.
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.
Placentalia; Eutheria; Mammalia; mammalian phylogeny; phylogenomics; Atlantogenata; molecular systematics