Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella 
PeerJ  2014;2:e620.
Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.
PMCID: PMC4201946  PMID: 25332847
Salmonella; Outbreak; Congruence; Phylogenetics; Next generation sequencing; Single nucleotide polymorphism
2.  The evolutionary history and diagnostic utility of the CRISPR-Cas system within Salmonella enterica ssp. enterica 
PeerJ  2014;2:e340.
Evolutionary studies of clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (cas) genes can provide insights into host-pathogen co-evolutionary dynamics and the frequency at which different genomic events (e.g., horizontal vs. vertical transmission) occur. Within this study, we used whole genome sequence (WGS) data to determine the evolutionary history and genetic diversity of CRISPR loci and cas genes among a diverse set of 427 Salmonella enterica ssp. enterica isolates representing 64 different serovars. We also evaluated the performance of CRISPR loci for typing when compared to whole genome and multilocus sequence typing (MLST) approaches. We found that there was high diversity in array length within both CRISPR1 (median = 22; min = 3; max = 79) and CRISPR2 (median = 27; min = 2; max = 221). There was also much diversity within serovars (e.g., arrays differed by as many as 50 repeat-spacer units among Salmonella ser. Senftenberg isolates). Interestingly, we found that there are two general cas gene profiles that do not track phylogenetic relationships, which suggests that non-vertical transmission events have occurred frequently throughout the evolutionary history of the sampled isolates. There is also considerable variation among the ranges of pairwise distances estimated within each cas gene, which may be indicative of the strength of natural selection acting on those genes. We developed a novel clustering approach based on CRISPR spacer content, but found that typing based on CRISPRs was less accurate than the MLST-based alternative; typing based on WGS data was the most accurate. Notwithstanding cost and accessibility, we anticipate that draft genome sequencing, due to its greater discriminatory power, will eventually become routine for traceback investigations.
PMCID: PMC3994646  PMID: 24765574
Salmonella; Horizontal gene transfer; Evolution; CRISPR; Outbreak; Phylogeny; Whole genome sequencing; Typing
3.  Phylogenetic Diversity of the Enteric Pathogen Salmonella enterica subsp. enterica Inferred from Genome-Wide Reference-Free SNP Characters 
Genome Biology and Evolution  2013;5(11):2109-2123.
The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles—some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms.
PMCID: PMC3845640  PMID: 24158624
H antigens; serovar; O antigens; CRISPR; lineage-through-time plot; comparative method
4.  Co-Enriching Microflora Associated with Culture Based Methods to Detect Salmonella from Tomato Phyllosphere 
PLoS ONE  2013;8(9):e73079.
The ability to detect a specific organism from a complex environment is vitally important to many fields of public health, including food safety. For example, tomatoes have been implicated numerous times as vehicles of foodborne outbreaks due to strains of Salmonella but few studies have ever recovered Salmonella from a tomato phyllosphere environment. Precision of culturing techniques that target agents associated with outbreaks depend on numerous factors. One important factor to better understand is which species co-enrich during enrichment procedures and how microbial dynamics may impede or enhance detection of target pathogens. We used a shotgun sequence approach to describe taxa associated with samples pre-enrichment and throughout the enrichment steps of the Bacteriological Analytical Manual's (BAM) protocol for detection of Salmonella from environmental tomato samples. Recent work has shown that during efforts to enrich Salmonella (Proteobacteria) from tomato field samples, Firmicute genera are also co-enriched and at least one co-enriching Firmicute genus (Paenibacillus sp.) can inhibit and even kills strains of Salmonella. Here we provide a baseline description of microflora that co-culture during detection efforts and the utility of a bioinformatic approach to detect specific taxa from metagenomic sequence data. We observed that uncultured samples clustered together with distinct taxonomic profiles relative to the three cultured treatments (Universal Pre-enrichment broth (UPB), Tetrathionate (TT), and Rappaport-Vassiliadis (RV)). There was little consistency among samples exposed to the same culturing medias, suggesting significant microbial differences in starting matrices or stochasticity associated with enrichment processes. Interestingly, Paenibacillus sp. (Salmonella inhibitor) was significantly enriched from uncultured to cultured (UPB) samples. Also of interest was the sequence based identification of a number of sequences as Salmonella despite indication by all media, that samples were culture negative for Salmonella. Our results substantiate the nascent utility of metagenomic methods to improve both biological and bioinformatic pathogen detection methods.
PMCID: PMC3767688  PMID: 24039862
5.  Elucidating the evolutionary history and expression patterns of nucleoside phosphorylase paralogs (vegetative storage proteins) in Populus and the plant kingdom 
BMC Plant Biology  2013;13:118.
Nucleoside phosphorylases (NPs) have been extensively investigated in human and bacterial systems for their role in metabolic nucleotide salvaging and links to oncogenesis. In plants, NP-like proteins have not been comprehensively studied, likely because there is no evidence of a metabolic function in nucleoside salvage. However, in the forest trees genus Populus a family of NP-like proteins function as an important ecophysiological adaptation for inter- and intra-seasonal nitrogen storage and cycling.
We conducted phylogenetic analyses to determine the distribution and evolution of NP-like proteins in plants. These analyses revealed two major clusters of NP-like proteins in plants. Group I proteins were encoded by genes across a wide range of plant taxa while proteins encoded by Group II genes were dominated by species belonging to the order Malpighiales and included the Populus Bark Storage Protein (BSP) and WIN4-like proteins. Additionally, we evaluated the NP-like genes in Populus by examining the transcript abundance of the 13 NP-like genes found in the Populus genome in various tissues of plants exposed to long-day (LD) and short-day (SD) photoperiods. We found that all 13 of the Populus NP-like genes belonging to either Group I or II are expressed in various tissues in both LD and SD conditions. Tests of natural selection and expression evolution analysis of the Populus genes suggests that divergence in gene expression may have occurred recently during the evolution of Populus, which supports the adaptive maintenance models. Lastly, in silico analysis of cis-regulatory elements in the promoters of the 13 NP-like genes in Populus revealed common regulatory elements known to be involved in light regulation, stress/pathogenesis and phytohormone responses.
In Populus, the evolution of the NP-like protein and gene family has been shaped by duplication events and natural selection. Expression data suggest that previously uncharacterized NP-like proteins may function in nutrient sensing and/or signaling. These proteins are members of Group I NP-like proteins, which are widely distributed in many plant taxa. We conclude that NP-like proteins may function in plants, although this function is undefined.
PMCID: PMC3751785  PMID: 23957885
Nucleoside phosphorylases; Vegetative storage proteins; Bark storage proteins; Nitrogen cycling; Populus trichocarpa
6.  Baseline survey of the anatomical microbial ecology of an important food plant: Solanum lycopersicum (tomato) 
BMC Microbiology  2013;13:114.
Research to understand and control microbiological risks associated with the consumption of fresh fruits and vegetables has examined many environments in the farm to fork continuum. An important data gap however, that remains poorly studied is the baseline description of microflora that may be associated with plant anatomy either endemically or in response to environmental pressures. Specific anatomical niches of plants may contribute to persistence of human pathogens in agricultural environments in ways we have yet to describe. Tomatoes have been implicated in outbreaks of Salmonella at least 17 times during the years spanning 1990 to 2010. Our research seeks to provide a baseline description of the tomato microbiome and possibly identify whether or not there is something distinctive about tomatoes or their growing ecology that contributes to persistence of Salmonella in this important food crop.
DNA was recovered from washes of epiphytic surfaces of tomato anatomical organs; leaves, stems, roots, flowers and fruits of Solanum lycopersicum (BHN602), grown at a site in close proximity to commercial farms previously implicated in tomato-Salmonella outbreaks. DNA was amplified for targeted 16S and 18S rRNA genes and sheared for shotgun metagenomic sequencing. Amplicons and metagenomes were used to describe “native” bacterial microflora for diverse anatomical parts of Virginia-grown tomatoes.
Distinct groupings of microbial communities were associated with different tomato plant organs and a gradient of compositional similarity could be correlated to the distance of a given plant part from the soil. Unique bacterial phylotypes (at 95% identity) were associated with fruits and flowers of tomato plants. These include Microvirga, Pseudomonas, Sphingomonas, Brachybacterium, Rhizobiales, Paracocccus, Chryseomonas and Microbacterium. The most frequently observed bacterial taxa across aerial plant regions were Pseudomonas and Xanthomonas. Dominant fungal taxa that could be identified to genus with 18S amplicons included Hypocrea, Aureobasidium and Cryptococcus. No definitive presence of Salmonella could be confirmed in any of the plant samples, although 16S sequences suggested that closely related genera were present on leaves, fruits and roots.
PMCID: PMC3680157  PMID: 23705801
Tomato microflora; 16S; 18S; Metagenomics; Phyllosphere; Solanum lycopersicum; Tomato organs; Microbial ecology; Baseline microflora; Tomatome
7.  Using metagenomic analyses to estimate the consequences of enrichment bias for pathogen detection 
BMC Research Notes  2012;5:378.
Enriching environmental samples to increase the probability of detection has been standard practice throughout the history of microbiology. However, by its very nature, the process of enrichment creates a biased sample that may have unintended consequences for surveillance or resolving a pathogenic outbreak. With the advent of next-generation sequencing and metagenomic approaches, the possibility now exists to quantify enrichment bias at an unprecedented taxonomic breadth.
We investigated differences in taxonomic profiles of three enriched and unenriched tomato phyllosphere samples taken from three different tomato fields (n = 18). 16S rRNA gene meteganomes were created for each of the 18 samples using 454/Roche’s pyrosequencing platform, resulting in a total of 165,259 sequences. Significantly different taxonomic profiles and abundances at a number of taxonomic levels were observed between the two treatments. Although as many as 28 putative Salmonella sequences were detected in enriched samples, there was no significant difference in the abundance of Salmonella between enriched and unenriched treatments.
Our results illustrate that the process of enriching greatly alters the taxonomic profile of an environmental sample beyond that of the target organism. We also found evidence suggesting that enrichment may not increase the probability of detecting a target. In conclusion, our results further emphasize the need to develop metagenomics as a validated culture independent method for pathogen detection.
PMCID: PMC3441234  PMID: 22839680
Enrichment bias; Metagenomics; Pathogen; Taxonomy
8.  Phylogenetic patterns and conservation among North American members of the genus Agalinis (Orobanchaceae) 
North American Agalinis Raf. species represent a taxonomically challenging group and there have been extensive historical revisions at the species, section, and subsection levels of classification. The genus contains many rare species, including the federally listed endangered species Agalinis acuta. In addition to evaluating the degree to which historical classifications at the section and subsection levels are supported by molecular data sampled from 79 individuals representing 29 Agalinis species, we assessed the monophyly of 27 species by sampling multiple individuals representing different populations of those species. Twenty-one of these species are of conservation concern in at least some part of their range.
Phylogenetic relationships estimated using maximum likelihood analyses of seven chloroplast DNA loci (aligned length = 11 076 base pairs (bp) and the nuclear ribosomal DNA ITS (internal transcribed spacer) locus (733 bp); indicated no support for the historically recognized sections except for Section Erectae. Our results suggest that North American members of the genus comprise six major lineages, however we were not able to resolve branching order among many of these lineages. Monophyly of 24 of the 29 sampled species was supported based on significant branch lengths of and high bootstrap support for subtending branches. However, there was no statistical support for the monophyly of A. acuta with respect to Agalinis tenella and Agalinis decemloba. Although most species were supported, deeper relationships among many species remain ambiguous.
The North American Agalinis species sampled form a well supported, monophyletic group within the family Orobanchaceae relative to the outgroups sampled. Most hypotheses regarding section- and subsection-level relationships based on morphology were not supported and taxonomic revisions are warranted. Lack of support for monophyly of Agalinis acuta leaves the important question regarding its taxonomic status unanswered. Lack of resolution is potentially due to incomplete lineage sorting of ancestral polymorphisms among recently diverged species; however the gene regions examined did distinguish among almost all other species in the genus. Due to the important policy implications of this finding we are further evaluating the evolutionary distinctiveness of A. acuta using morphological data and loci with higher mutation rates.
PMCID: PMC2564944  PMID: 18822144

Results 1-8 (8)