Comparative methods for analyzing whole genome sequence (WGS) data enable us to assess the genetic information available for reconstructing the evolutionary history of pathogens. We used the comparative approach to determine diagnostic genes for Salmonella enterica subspecies I. S. enterica subsp. I strains are known to infect warm-blooded organisms regularly while its close relatives tend to infect only cold-blooded organisms. We found 71 genes gained by the common ancestor of Salmonella enterica subspecies I and not subsequently lost by any member of this subspecies sequenced to date. These genes included many putative functional phenotypes. Twenty-seven of these genes are found only in Salmonella enterica subspecies I; we designed primers to test these genes for use as diagnostic sequence targets and data mined the NCBI Sequence Read Archive (SRA) database for draft genomes which carried these genes. We found that the sequence specificity and variability of these amplicons can be used to detect and discriminate among 317 different serovars and strains of Salmonella enterica subspecies I.
The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms.
H antigens; serovar; O antigens; CRISPR; lineage-through-time plot; comparative method
The ability to detect a specific organism from a complex environment is vitally important to many fields of public health, including food safety. For example, tomatoes have been implicated numerous times as vehicles of foodborne outbreaks due to strains of Salmonella but few studies have ever recovered Salmonella from a tomato phyllosphere environment. Precision of culturing techniques that target agents associated with outbreaks depend on numerous factors. One important factor to better understand is which species co-enrich during enrichment procedures and how microbial dynamics may impede or enhance detection of target pathogens. We used a shotgun sequence approach to describe taxa associated with samples pre-enrichment and throughout the enrichment steps of the Bacteriological Analytical Manual's (BAM) protocol for detection of Salmonella from environmental tomato samples. Recent work has shown that during efforts to enrich Salmonella (Proteobacteria) from tomato field samples, Firmicute genera are also co-enriched and at least one co-enriching Firmicute genus (Paenibacillus sp.) can inhibit and even kills strains of Salmonella. Here we provide a baseline description of microflora that co-culture during detection efforts and the utility of a bioinformatic approach to detect specific taxa from metagenomic sequence data. We observed that uncultured samples clustered together with distinct taxonomic profiles relative to the three cultured treatments (Universal Pre-enrichment broth (UPB), Tetrathionate (TT), and Rappaport-Vassiliadis (RV)). There was little consistency among samples exposed to the same culturing medias, suggesting significant microbial differences in starting matrices or stochasticity associated with enrichment processes. Interestingly, Paenibacillus sp. (Salmonella inhibitor) was significantly enriched from uncultured to cultured (UPB) samples. Also of interest was the sequence based identification of a number of sequences as Salmonella despite indication by all media, that samples were culture negative for Salmonella. Our results substantiate the nascent utility of metagenomic methods to improve both biological and bioinformatic pathogen detection methods.
Here, we report draft genomes of Paenibacillus alvei strains A6-6i and TS-15, which were isolated, respectively, from plant material and soil in the Virginia Eastern Shore (VES) tomato growing area. An array of genes related to antimicrobial biosynthetic pathways have been identified with whole-genome analyses of these strains.
An assay to identify the common food-borne pathogens Salmonella, Escherichia coli, Shigella, and Listeria monocytogenes was developed in collaboration with Ibis Biosciences (a division of Abbott Molecular) for the Plex-ID biosensor system, a platform that uses electrospray ionization mass spectroscopy (ESI-MS) to detect the base composition of short PCR amplicons. The new food-borne pathogen (FBP) plate has been experimentally designed using four gene segments for a total of eight amplicon targets. Initial work built a DNA base count database that contains more than 140 Salmonella enterica, 139 E. coli, 11 Shigella, and 36 Listeria patterns and 18 other Enterobacteriaceae organisms. This assay was tested to determine the scope of the assay's ability to detect and differentiate the enteric pathogens and to improve the reference database associated with the assay. More than 800 bacterial isolates of S. enterica, E. coli, and Shigella species were analyzed. Overall, 100% of S. enterica, 99% of E. coli, and 73% of Shigella spp. were detected using this assay. The assay was also able to identify 30% of the S. enterica serovars to the serovar level. To further characterize the assay, spiked food matrices and food samples collected during regulatory field work were also studied. While analysis of preenrichment media was inconsistent, identification of S. enterica from selective enrichment media resulted in serovar-level identifications for 8 of 10 regulatory samples. The results of this study suggest that this high-throughput method may be useful in clinical and regulatory laboratories testing for these pathogens.
Non-O157 Shiga toxin-producing Escherichia coli (STEC) strains are emerging food-borne pathogens causing life-threatening diseases and food-borne outbreaks. A better understanding of their evolution provides a framework for developing tools to control food safety. We obtained 15 genomes of non-O157 STEC strains, including O26, O111, and O103 strains. Phylogenetic trees revealed a close relationship between O26:H11 and O111:H11 and a scattered distribution of O111. We hypothesize that STEC serotypes with the same H antigens might share common ancestors.
Research to understand and control microbiological risks associated with the consumption of fresh fruits and vegetables has examined many environments in the farm to fork continuum. An important data gap however, that remains poorly studied is the baseline description of microflora that may be associated with plant anatomy either endemically or in response to environmental pressures. Specific anatomical niches of plants may contribute to persistence of human pathogens in agricultural environments in ways we have yet to describe. Tomatoes have been implicated in outbreaks of Salmonella at least 17 times during the years spanning 1990 to 2010. Our research seeks to provide a baseline description of the tomato microbiome and possibly identify whether or not there is something distinctive about tomatoes or their growing ecology that contributes to persistence of Salmonella in this important food crop.
DNA was recovered from washes of epiphytic surfaces of tomato anatomical organs; leaves, stems, roots, flowers and fruits of Solanum lycopersicum (BHN602), grown at a site in close proximity to commercial farms previously implicated in tomato-Salmonella outbreaks. DNA was amplified for targeted 16S and 18S rRNA genes and sheared for shotgun metagenomic sequencing. Amplicons and metagenomes were used to describe “native” bacterial microflora for diverse anatomical parts of Virginia-grown tomatoes.
Distinct groupings of microbial communities were associated with different tomato plant organs and a gradient of compositional similarity could be correlated to the distance of a given plant part from the soil. Unique bacterial phylotypes (at 95% identity) were associated with fruits and flowers of tomato plants. These include Microvirga, Pseudomonas, Sphingomonas, Brachybacterium, Rhizobiales, Paracocccus, Chryseomonas and Microbacterium. The most frequently observed bacterial taxa across aerial plant regions were Pseudomonas and Xanthomonas. Dominant fungal taxa that could be identified to genus with 18S amplicons included Hypocrea, Aureobasidium and Cryptococcus. No definitive presence of Salmonella could be confirmed in any of the plant samples, although 16S sequences suggested that closely related genera were present on leaves, fruits and roots.
Tomato microflora; 16S; 18S; Metagenomics; Phyllosphere; Solanum lycopersicum; Tomato organs; Microbial ecology; Baseline microflora; Tomatome
Shiga toxin-producing Escherichia coli (STEC) causes severe illness in humans, including hemorrhagic colitis and hemolytic uremic syndrome. A parallel evolutionary model was proposed in which E. coli strains of distinct phylogenies independently integrate Shiga toxin-encoding genes and evolve into STEC. We report the draft genomes of two emerging non-O157 STEC strains.
Salmonellosis contributes significantly to the public health burden globally. Salmonella enterica serotype Newport is among Salmonella serotypes most associated with food-borne illness in the United States and China. It was thought to be polyphyletic and to contain different lineages. We report draft genomes of four S. Newport strains isolated from humans in China.
Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.
We report a closed genome of Salmonella enterica subsp. enterica serovar Javiana (S. Javiana). This serotype is a common food-borne pathogen and is often associated with fresh-cut produce. Complete (finished) genome assemblies will support pilot studies testing the utility of next-generation sequencing (NGS) technologies in public health laboratories.
Salmonellosis is a major contributor to the global public health burden. Salmonella enterica serotype Newport has ranked among three Salmonella serotypes most commonly associated with food-borne outbreaks in the United States. It was thought to be polyphyletic and composed of independent lineages. Here we report draft genomes of eight strains of S. Newport from diverse hosts and locations.
Salmonellosis has been one of the major contributors to the global public health burden. Salmonella enterica serotype Agona has ranked among the top 10 and top 20 most frequent Salmonella serotypes isolated from human sources in China and the United States, respectively. We report draft genomes of three S. Agona strains from China.
Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16–24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes) and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3′ end of Salmonella Pathogenicity Island 1 (SPI-1), ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR) associated-proteins (cas). These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S. Newport and also provided additional markers for epidemiological response.
Facile laboratory tools are needed to augment identification in contamination events to trace the contamination back to the source (traceback) of Salmonella enterica subsp. enterica serovar Enteritidis (S. Enteritidis). Understanding the evolution and diversity within and among outbreak strains is the first step towards this goal. To this end, we collected 106 new S. Enteriditis isolates within S. Enteriditis Pulsed-Field Gel Electrophoresis (PFGE) pattern JEGX01.0004 and close relatives, and determined their genome sequences. Sources for these isolates spanned food, clinical and environmental farm sources collected during the 2010 S. Enteritidis shell egg outbreak in the United States along with closely related serovars, S. Dublin, S. Gallinarum biovar Pullorum and S. Gallinarum. Despite the highly homogeneous structure of this population, S. Enteritidis isolates examined in this study revealed thousands of SNP differences and numerous variable genes (n = 366). Twenty-one of these genes from the lineages leading to outbreak-associated samples had nonsynonymous (causing amino acid changes) changes and five genes are putatively involved in known Salmonella virulence pathways. While chromosome synteny and genome organization appeared to be stable among these isolates, genome size differences were observed due to variation in the presence or absence of several phages and plasmids, including phage RE-2010, phage P125109, plasmid pSEEE3072_19 (similar to pSENV), plasmid pOU1114 and two newly observed mobile plasmid elements pSEEE1729_15 and pSEEE0956_35. These differences produced modifications to the assembled bases for these draft genomes in the size range of approximately 4.6 to 4.8 mbp, with S. Dublin being larger (∼4.9 mbp) and S. Gallinarum smaller (4.55 mbp) when compared to S. Enteritidis. Finally, we identified variable S. Enteritidis genes associated with virulence pathways that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future outbreaks involving S. Enteritidis PFGE pattern JEGX01.0004.
Salmonella enterica is recognized as one of the most common bacterial agents of foodborne illness. We report draft genomes of four Salmonella serovar Heidelberg isolates associated with the recent multistate outbreak of human Salmonella Heidelberg infections linked to kosher broiled chicken livers in the United States in 2011. Isolates 2011K-1259 and 2011K-1232 were recovered from humans, whereas 2011K-1724 and 2011K-1726 were isolated from chicken liver. Whole genome sequence analysis of these isolates provides a tool for studying the short-term evolution of these epidemic clones and can be used for characterizing potentially new virulence factors.
Salmonella enterica serovar Heidelberg has caused numerous outbreaks in humans. Here, we report draft genomes of five isolates of serovar Heidelberg associated with the recent (2011) multistate outbreak linked to ground turkey in the United States. Isolates 2011K-1110 and 2011K-1132 were recovered from humans, while isolates 2011K-1138, 2011K-1224, and 2011K-1225 were recovered from ground turkey. Whole-genome sequence analysis of these isolates provides a tool for studying the short-term evolution of these epidemic clones.
Lymphoid infiltration is a prognostic marker in solid tumors, such as colorectal, breast and lung carcinomas. However, lymphoid infiltration is heterogeneous and the reproducibility of quantification based on single counts within a tumor is very low. We aimed to develop a reproducible method for evaluating lymphoid infiltration in tumors.
Virtual slides were obtained from tissue sections from the localized colorectal carcinomas of 117 patients, stained for CD3 and CD45R0. We assessed the variation of lymphoid cell density by automatic counts in 1 mm-wide, 5 μm-long segments of the invasive front, along an axis 4 mm in length running perpendicular to the invasive front of the tumor.
We plotted curves of the variation of lymphocyte density across the tumor front. Three distinct patterns emerged from this linear quantification of lymphocyte (LQLI). In pattern 1, there was a high density of lymphocytes within the tumor. In pattern 2, lymphocyte density peaked close to the invasive margin. In pattern 3, lymphocytes were diffusely distributed, at low density. It was possible to classify all the tumors studied, and interobserver reproducibility was excellent (kappa =0.9). By contrast, single counts of CD3+ cells on tissue microarrays were highly variable for a given LQLI pattern, confirming the heterogeneity of lymphoid infiltration within individual tumors. In univariate analysis, all pathologic features (stage, metastatic lymph node ratio (LNR), vascular embolism, perineural invasion), CD3+ cell density, LQLI patterns for CD3+ and CD45R0+ cells) were found to have a significant effect on disease-free survival (DFS). In multivariate analysis, only the LQLI pattern for CD3+ cells (HR: 6.02; 95% CI: 2.74-13.18) and metastatic lymph node ratio (HR: 6.14; 95% CI: 2.32-16.2) were associated with DFS.
LQLI is an automated, reproducible method for the assessment of lymphoid infiltration. However, validation of its prognostic value in larger series is required before its introduction into routine practice for prognostic evaluation in patients with colorectal carcinomas.
The virtual slide(s) for this article can be found here:
Tumor infiltration; Lymphocytes; Invasive margin; Linear quantification; Colorectal cancer; Image analysis; Automated count
Cheese contamination can occur at numerous stages in the manufacturing process including the use of improperly pasteurized or raw milk. Of concern is the potential contamination by Listeria monocytogenes and other pathogenic bacteria that find the high moisture levels and moderate pH of popular Latin-style cheeses like queso fresco a hospitable environment. In the investigation of a foodborne outbreak, samples typically undergo enrichment in broth for 24 hours followed by selective agar plating to isolate bacterial colonies for confirmatory testing. The broth enrichment step may also enable background microflora to proliferate, which can confound subsequent analysis if not inhibited by effective broth or agar additives. We used 16S rRNA gene sequencing to provide a preliminary survey of bacterial species associated with three brands of Latin-style cheeses after 24-hour broth enrichment.
Brand A showed a greater diversity than the other two cheese brands (Brands B and C) at nearly every taxonomic level except phylum. Brand B showed the least diversity and was dominated by a single bacterial taxon, Exiguobacterium, not previously reported in cheese. This genus was also found in Brand C, although Lactococcus was prominent, an expected finding since this bacteria belongs to the group of lactic acid bacteria (LAB) commonly found in fermented foods.
The contrasting diversity observed in Latin-style cheese was surprising, demonstrating that despite similarity of cheese type, raw materials and cheese making conditions appear to play a critical role in the microflora composition of the final product. The high bacterial diversity associated with Brand A suggests it may have been prepared with raw materials of high bacterial diversity or influenced by the ecology of the processing environment. Additionally, the presence of Exiguobacterium in high proportions (96%) in Brand B and, to a lesser extent, Brand C (46%), may have been influenced by the enrichment process. This study is the first to define Latin-style cheese microflora using Next-Generation Sequencing. These valuable preliminary data will direct selective tailoring of agar formulations to improve culture-based detection of pathogens in Latin-style cheese.
Latin-style cheese; Next Generation Sequencing; Microflora; Bacteria; Exiguobacterium
Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.
carnivore; genome; SINE
Enriching environmental samples to increase the probability of detection has been standard practice throughout the history of microbiology. However, by its very nature, the process of enrichment creates a biased sample that may have unintended consequences for surveillance or resolving a pathogenic outbreak. With the advent of next-generation sequencing and metagenomic approaches, the possibility now exists to quantify enrichment bias at an unprecedented taxonomic breadth.
We investigated differences in taxonomic profiles of three enriched and unenriched tomato phyllosphere samples taken from three different tomato fields (n = 18). 16S rRNA gene meteganomes were created for each of the 18 samples using 454/Roche’s pyrosequencing platform, resulting in a total of 165,259 sequences. Significantly different taxonomic profiles and abundances at a number of taxonomic levels were observed between the two treatments. Although as many as 28 putative Salmonella sequences were detected in enriched samples, there was no significant difference in the abundance of Salmonella between enriched and unenriched treatments.
Our results illustrate that the process of enriching greatly alters the taxonomic profile of an environmental sample beyond that of the target organism. We also found evidence suggesting that enrichment may not increase the probability of detecting a target. In conclusion, our results further emphasize the need to develop metagenomics as a validated culture independent method for pathogen detection.
Enrichment bias; Metagenomics; Pathogen; Taxonomy
Next-Generation Sequencing (NGS) is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study.
Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data.
Implementation of a validated pipeline for NGS data acquisition and analysis provides highly reproducible results that are stable and predictable for molecular epidemiological applications. When draft genomes are collected at 15×-20× coverage and passed through a quality filter as part of a data analysis pipeline, including sub-passaged replicates defined by a few SNPs, they can be accurately placed in a phylogenetic context. This reproducibility applies to all levels within and between serovars of Salmonella suggesting that investigators using these methods can have confidence in their conclusions.
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.
Placentalia; Eutheria; Mammalia; mammalian phylogeny; phylogenomics; Atlantogenata; molecular systematics