Search tips
Search criteria

Results 1-25 (70)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
2.  Resolution of a Protracted Serogroup B Meningococcal Outbreak with Whole-Genome Sequencing Shows Interspecies Genetic Transfer 
Journal of Clinical Microbiology  2016;54(12):2891-2899.
A carriage study was undertaken (n = 112) to ascertain the prevalence of Neisseria spp. following the eighth case of invasive meningococcal disease in young children (5 to 46 months) and members of a large extended indigenous ethnic minority Traveller family (n = 123), typically associated with high-occupancy living conditions. Nested multilocus sequence typing (MLST) was employed for case specimen extracts. Isolates were genome sequenced and then were assembled de novo and deposited into the Bacterial Isolate Genome Sequencing Database (BIGSdb). This facilitated an expanded MLST approach utilizing large numbers of loci for isolate characterization and discrimination. A rare sequence type, ST-6697, predominated in disease specimens and isolates that were carried (n = 8/14), persisting for at least 44 months, likely driven by the high population density of houses (n = 67/112) and trailers (n = 45/112). Carriage for Neisseria meningitidis (P < 0.05) and Neisseria lactamica (P < 0.002) (2-sided Fisher's exact test) was more likely in the smaller, more densely populated trailers. Meningococcal carriage was highest in 24- to 39-year-olds (45%, n = 9/20). Evidence of horizontal gene transfer (HGT) was observed in four individuals cocolonized by Neisseria lactamica and Neisseria meningitidis. One HGT event resulted in the acquisition of 26 consecutive N. lactamica alleles. This study demonstrates how housing density can drive meningococcal transmission and carriage, which likely facilitated the persistence of ST-6697 and prolonged the outbreak. Whole-genome MLST effectively distinguished between highly similar outbreak strain isolates, including those isolated from person-to-person transmission, and also highlighted how a few HGT events can distort the true phylogenetic relationship between highly similar clonal isolates.
PMCID: PMC5121376  PMID: 27629899
3.  Integration of Genomic and Other Epidemiologic Data to Investigate and Control a Cross-Institutional Outbreak of Streptococcus pyogenes 
Emerging Infectious Diseases  2016;22(6):973-980.
Genomic surveillance can effectively detect such outbreaks, providing increased intelligence to support infection control.
Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice.
PMCID: PMC4880081  PMID: 27192043
Group A Streptococcus; invasive group A streptococcal infection; iGAS; outbreak; genome; infection control; whole-genome sequencing; Streptococcus pyogenes; epidemiologic data; genomic data; investigation; bacteria; streptococci
4.  Phylogenetic Analysis of Invasive Serotype 1 Pneumococcus in South Africa, 1989 to 2013 
Journal of Clinical Microbiology  2016;54(5):1326-1334.
Serotype 1 is an important cause of invasive pneumococcal disease in South Africa and has declined following the introduction of the 13-valent pneumococcal conjugate vaccine in 2011. We genetically characterized 912 invasive serotype 1 isolates from 1989 to 2013. Simpson's diversity index (D) and recombination ratios were calculated. Factors associated with sequence types (STs) were assessed. Clonal complex 217 represented 96% (872/912) of the sampled isolates. Following the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13), ST diversity increased in children <5 years (D, 0.39 to 0.63, P = 0.002) and individuals >14 years (D, 0.35 to 0.54, P < 0.001): ST-217 declined proportionately in children <5 years (153/203 [75%] versus 21/37 [57%], P = 0.027) and individuals >14 years (242/305 [79%] versus 96/148 [65%], P = 0.001), whereas ST-9067 increased (4/684 [0.6%] versus 24/228 [11%], P < 0.001). Three subclades were identified within ST-217: ST-217C1 (353/382 [92%]), ST-217C2 (15/382 [4%]), and ST-217C3 (14/382 [4%]). ST-217C2, ST-217C3, and single-locus variant (SLV) ST-8314 (20/912 [2%]) were associated with nonsusceptibility to chloramphenicol, tetracycline, and co-trimoxazole. ST-8314 (20/912 [2%]) was also associated with increased nonsusceptibility to penicillin (P < 0.001). ST-217C3 and newly reported ST-9067 had higher recombination ratios than those of ST-217C1 (4.344 versus 0.091, P < 0.001; and 0.086 versus 0.013, P < 0.001, respectively). Increases in genetic diversity were noted post-PCV13, and lineages associated with antimicrobial nonsusceptibility were identified.
PMCID: PMC4844715  PMID: 26962082
6.  Biofilm Morphotypes and Population Structure among Staphylococcus epidermidis from Commensal and Clinical Samples 
PLoS ONE  2016;11(3):e0151240.
Bacterial species comprise related genotypes that can display divergent phenotypes with important clinical implications. Staphylococcus epidermidis is a common cause of nosocomial infections and, critical to its pathogenesis, is its ability to adhere and form biofilms on surfaces, thereby moderating the effect of the host’s immune response and antibiotics. Commensal S. epidermidis populations are thought to differ from those associated with disease in factors involved in adhesion and biofilm accumulation. We quantified the differences in biofilm formation in 98 S. epidermidis isolates from various sources, and investigated population structure based on ribosomal multilocus typing (rMLST) and the presence/absence of genes involved in adhesion and biofilm formation. All isolates were able to adhere and form biofilms in in vitro growth assays and confocal microscopy allowed classification into 5 biofilm morphotypes based on their thickness, biovolume and roughness. Phylogenetic reconstruction grouped isolates into three separate clades, with the isolates in the main disease associated clade displaying diversity in morphotype. Of the biofilm morphology characteristics, only biofilm thickness had a significant association with clade distribution. The distribution of some known adhesion-associated genes (aap and sesE) among isolates showed a significant association with the species clonal frame. These data challenge the assumption that biofilm-associated genes, such as those on the ica operon, are genetic markers for less invasive S. epidermidis isolates, and suggest that phenotypic characteristics, such as adhesion and biofilm formation, are not fixed by clonal descent but are influenced by the presence of various genes that are mobile among lineages.
PMCID: PMC4792440  PMID: 26978068
7.  Genomic epidemiology of age-associated meningococcal lineages in national surveillance: an observational cohort study 
The Lancet. Infectious Diseases  2015;15(12):1420-1428.
Invasive meningococcal disease (IMD) is a worldwide health issue that is potentially preventable with vaccination. In view of its sporadic nature and the high diversity of Neisseria meningitidis, epidemiological surveillance incorporating detailed isolate characterisation is crucial for effective control and understanding the evolving epidemiology of IMD. The Meningitis Research Foundation Meningococcus Genome Library (MRF-MGL) exploits whole-genome sequencing (WGS) for this purpose and presents data on a comprehensive and coherent IMD isolate collection from England and Wales via the internet. We assessed the contribution of these data to investigating IMD epidemiology.
WGS data were obtained for all 899 IMD isolates available for England and Wales in epidemiological years 2010–11 and 2011–12. The data had been annotated at 1720 loci, analysed, and disseminated online. Information was also available on meningococcal population structure and vaccine (Bexsero, GlaxoSmithKline, Brentford, Middlesex, UK) antigen variants, which enabled the investigation of IMD-associated genotypes over time and by patients' age groups. Population genomic analyses were done with a hierarchical gene-by-gene approach.
The methods used by MRF-MGL efficiently characterised IMD isolates and information was provided in plain language. At least 20 meningococcal lineages were identified, three of which (hyperinvasive clonal complexes 41/44 [lineage 3], 269 [lineage 2], and 23 [lineage 23]) were responsible for 528 (59%) of IMD isolates. Lineages were highly diverse and showed evidence of extensive recombination. Specific lineages were associated with IMD in particular age groups, with notable diversity in the youngest and oldest individuals. The increased incidence of IMD from 1984 to 2010 in England and Wales was due to successive and concurrent epidemics of different lineages. Genetically, 74% of isolates were characterised as encoding group B capsules: 16% group Y, 6% group W, and 3% group C. Exact peptide matches for individual Bexsero vaccine antigens were present in up to 26% of isolates.
The MRF-MGL represents an effective, broadly applicable model for the storage, analysis, and dissemination of WGS data that can facilitate real-time genomic pathogen surveillance. The data revealed information crucial to effective deployment and assessment of vaccines against N meningitidis.
Meningitis Research Foundation, Wellcome Trust, Public Health England, European Union.
PMCID: PMC4655307  PMID: 26515523
8.  Genomic resolution of an aggressive, widespread, diverse and expanding meningococcal serogroup B, C and W lineage 
The Journal of Infection  2015;71(5):544-552.
Neisseria meningitidis is a leading cause of meningitis and septicaemia. The hyperinvasive ST-11 clonal complex (cc11) caused serogroup C (MenC) outbreaks in the US military in the 1960s and UK universities in the 1990s, a global Hajj-associated serogroup W (MenW) outbreak in 2000–2001, and subsequent MenW epidemics in sub-Saharan Africa. More recently, endemic MenW disease has expanded in South Africa, South America and the UK, and MenC cases have been reported among European and North American men who have sex with men (MSM). Routine typing schemes poorly resolve cc11 so we established the population structure at genomic resolution.
Representatives of these episodes and other geo-temporally diverse cc11 meningococci (n = 750) were compared across 1546 core genes and visualised on phylogenetic networks.
MenW isolates were confined to a distal portion of one of two main lineages with MenB and MenC isolates interspersed elsewhere. An expanding South American/UK MenW strain was distinct from the ‘Hajj outbreak’ strain and a closely related endemic South African strain. Recent MenC isolates from MSM in France and the UK were closely related but distinct.
High resolution ‘genomic’ multilocus sequence typing is necessary to resolve and monitor the spread of diverse cc11 lineages globally.
•The meningococcal ST-11 clonal complex is diverse.•A ‘South American’ serogroup W strain is currently expanding in the UK.•Supports decision to vaccinate UK teenagers against meningococcal serogroup W.•A distinct endemic South African strain is related to the 2000 Hajj outbreak strain.•Serogroup C cases among MSM in UK and France are related but distinct.
PMCID: PMC4635312  PMID: 26226598
Meningococcal; ST-11 clonal complex; Genome; Serogroup W; Serogroup C
9.  The Landscape of Realized Homologous Recombination in Pathogenic Bacteria 
Molecular Biology and Evolution  2015;33(2):456-471.
Recombination enhances the adaptive potential of organisms by allowing genetic variants to be tested on multiple genomic backgrounds. Its distribution in the genome can provide insight into the evolutionary forces that underlie traits, such as the emergence of pathogenicity. Here, we examined landscapes of realized homologous recombination of 500 genomes from ten bacterial species and found all species have “hot” regions with elevated rates relative to the genome average. We examined the size, gene content, and chromosomal features associated with these regions and the correlations between closely related species. The recombination landscape is variable and evolves rapidly. For example in Salmonella, only short regions of around 1 kb in length are hot whereas in the closely related species Escherichia coli, some hot regions exceed 100 kb, spanning many genes. Only Streptococcus pyogenes shows evidence for the positive correlation between GC content and recombination that has been reported for several eukaryotes. Genes with function related to the cell surface/membrane are often found in recombination hot regions but E. coli is the only species where genes annotated as “virulence associated” are consistently hotter. There is also evidence that some genes with “housekeeping” functions tend to be overrepresented in cold regions. For example, ribosomal proteins showed low recombination in all of the species. Among specific genes, transferrin-binding proteins are recombination hot in all three of the species in which they were found, and are subject to interspecies recombination.
PMCID: PMC4866539  PMID: 26516092
recombination; selection; pathogenicity; population genomics
10.  Genomics Reveals the Worldwide Distribution of Multidrug-Resistant Serotype 6E Pneumococci 
Journal of Clinical Microbiology  2015;53(7):2271-2285.
The pneumococcus is a leading pathogen infecting children and adults. Safe, effective vaccines exist, and they work by inducing antibodies to the polysaccharide capsule (unique for each serotype) that surrounds the cell; however, current vaccines are limited by the fact that only a few of the nearly 100 antigenically distinct serotypes are included in the formulations. Within the serotypes, serogroup 6 pneumococci are a frequent cause of serious disease and common colonizers of the nasopharynx in children. Serotype 6E was first reported in 2004 but was thought to be rare; however, we and others have detected serotype 6E among recent pneumococcal collections. Therefore, we analyzed a diverse data set of ∼1,000 serogroup 6 genomes, assessed the prevalence and distribution of serotype 6E, analyzed the genetic diversity among serogroup 6 pneumococci, and investigated whether pneumococcal conjugate vaccine-induced serotype 6A and 6B antibodies mediate the killing of serotype 6E pneumococci. We found that 43% of all genomes were of serotype 6E, and they were recovered worldwide from healthy children and patients of all ages with pneumococcal disease. Four genetic lineages, three of which were multidrug resistant, described ∼90% of the serotype 6E pneumococci. Serological assays demonstrated that vaccine-induced serotype 6B antibodies were able to elicit killing of serotype 6E pneumococci. We also revealed three major genetic clusters of serotype 6A capsular sequences, discovered a new hybrid 6C/6E serotype, and identified 44 examples of serotype switching. Therefore, while vaccines appear to offer protection against serotype 6E, genetic variants may reduce vaccine efficacy in the longer term because of the emergence of serotypes that can evade vaccine-induced immunity.
PMCID: PMC4473186  PMID: 25972423
11.  Genome-Based Characterization of Emergent Invasive Neisseria meningitidis Serogroup Y Isolates in Sweden from 1995 to 2012 
Journal of Clinical Microbiology  2015;53(7):2154-2162.
Invasive meningococcal disease (IMD) caused by Neisseria meningitidis serogroup Y has increased in Europe, especially in Scandinavia. In Sweden, serogroup Y is now the dominating serogroup, and in 2012, the serogroup Y disease incidence was 0.46/100,000 population. We previously showed that a strain type belonging to sequence type 23 was responsible for the increased prevalence of this serogroup in Sweden. The objective of this study was to investigate the serogroup Y emergence by whole-genome sequencing and compare the meningococcal population structure of Swedish invasive serogroup Y strains to those of other countries with different IMD incidence. Whole-genome sequencing was performed on invasive serogroup Y isolates from 1995 to 2012 in Sweden (n = 186). These isolates were compared to a collection of serogroup Y isolates from England, Wales, and Northern Ireland from 2010 to 2012 (n = 143), which had relatively low serogroup Y incidence, and two isolates obtained in 1999 in the United States, where serogroup Y remains one of the major causes of IMD. The meningococcal population structures were similar in the investigated regions; however, different strain types were prevalent in each geographic region. A number of genes known or hypothesized to have an impact on meningococcal virulence were shown to be associated with different strain types and subtypes. The reasons for the IMD increase are multifactorial and are influenced by increased virulence, host adaptive immunity, and transmission. Future genome-wide association studies are needed to reveal additional genes associated with serogroup Y meningococcal disease, and this work would benefit from a complete serogroup Y meningococcal reference genome.
PMCID: PMC4473204  PMID: 25926489
12.  Ecological Overlap and Horizontal Gene Transfer in Staphylococcus aureus and Staphylococcus epidermidis 
Genome Biology and Evolution  2015;7(5):1313-1328.
The opportunistic pathogens Staphylococcus aureus and Staphylococcus epidermidis represent major causes of severe nosocomial infection, and are associated with high levels of mortality and morbidity worldwide. These species are both common commensals on the human skin and in the nasal pharynx, but are genetically distinct, differing at 24% average nucleotide divergence in 1,478 core genes. To better understand the genome dynamics of these ecologically similar staphylococcal species, we carried out a comparative analysis of 324 S. aureus and S. epidermidis genomes, including 83 novel S. epidermidis sequences. A reference pan-genome approach and whole genome multilocus-sequence typing revealed that around half of the genome was shared between the species. Based on a BratNextGen analysis, homologous recombination was found to have impacted on 40% of the core genes in S. epidermidis, but on only 24% of the core genes in S. aureus. Homologous recombination between the species is rare, with a maximum of nine gene alleles shared between any two S. epidermidis and S. aureus isolates. In contrast, there was considerable interspecies admixture of mobile elements, in particular genes associated with the SaPIn1 pathogenicity island, metal detoxification, and the methicillin-resistance island SCCmec. Our data and analysis provide a context for considering the nature of recombinational boundaries between S. aureus and S. epidermidis and, the selective forces that influence realized recombination between these species.
PMCID: PMC4453061  PMID: 25888688
Staphylococcus; evolution; ecology; recombination; nosocomial infections
13.  Neisseria Adhesin A Variation and Revised Nomenclature Scheme 
Neisseria adhesin A (NadA), involved in the adhesion and invasion of Neisseria meningitidis into host tissues, is one of the major components of Bexsero, a novel multicomponent vaccine licensed for protection against meningococcal serogroup B in Europe, Australia, and Canada. NadA has been identified in approximately 30% of clinical isolates and in a much lower proportion of carrier isolates. Three protein variants were originally identified in invasive meningococci and named NadA-1, NadA-2, and NadA-3, whereas most carrier isolates either lacked the gene or harbored a different variant, NadA-4. Further analysis of isolates belonging to the sequence type 213 (ST-213) clonal complex identified NadA-5, which was structurally similar to NadA-4, but more distantly related to NadA-1, -2, and -3. At the time of this writing, more than 89 distinct nadA allele sequences and 43 distinct peptides have been described. Here, we present a revised nomenclature system, taking into account the complete data set, which is compatible with previous classification schemes and is expandable. The main features of this new scheme include (i) the grouping of the previously named NadA-2 and NadA-3 variants into a single NadA-2/3 variant, (ii) the grouping of the previously assigned NadA-4 and NadA-5 variants into a single NadA-4/5 variant, (iii) the introduction of an additional variant (NadA-6), and (iv) the classification of the variants into two main groups, named groups I and II. To facilitate querying of the sequences and submission of new allele sequences, the nucleotide and amino acid sequences are available at
PMCID: PMC4097447  PMID: 24807056
14.  A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes 
BMC Genomics  2014;15(1):1138.
Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.
The performance of de novo short-read assembly followed by automatic annotation using the Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.
The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1138) contains supplementary material, which is available to authorized users.
PMCID: PMC4377854  PMID: 25523208
Neisseria meningitidis; de novo assembly; BIGSdb; Gene-by-gene analysis; cgMLST; rMLST; rST; Bacterial population genomics
15.  Cronobacter, the emergent bacterial pathogen Enterobacter sakazakii comes of age; MLST and whole genome sequence analysis 
BMC Genomics  2014;15(1):1121.
Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database ( containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections
Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains.
The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes ‘on the fly’, and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1121) contains supplementary material, which is available to authorized users.
PMCID: PMC4377842  PMID: 25515150
Emergent bacterial pathogen; Cronobacter; MLST; Genomic analysis
16.  The domestication of the probiotic bacterium Lactobacillus acidophilus 
Scientific Reports  2014;4:7202.
Lactobacillus acidophilus is a Gram-positive lactic acid bacterium that has had widespread historical use in the dairy industry and more recently as a probiotic. Although L. acidophilus has been designated as safe for human consumption, increasing commercial regulation and clinical demands for probiotic validation has resulted in a need to understand its genetic diversity. By drawing on large, well-characterised collections of lactic acid bacteria, we examined L. acidophilus isolates spanning 92 years and including multiple strains in current commercial use. Analysis of the whole genome sequence data set (34 isolate genomes) demonstrated L. acidophilus was a low diversity, monophyletic species with commercial isolates essentially identical at the sequence level. Our results indicate that commercial use has domesticated L. acidophilus with genetically stable, invariant strains being consumed globally by the human population.
PMCID: PMC4244635  PMID: 25425319
17.  Cryptic ecology among host generalist Campylobacter jejuni in domestic animals 
Molecular Ecology  2014;23(10):2442-2451.
Homologous recombination between bacterial strains is theoretically capable of preventing the separation of daughter clusters, and producing cohesive clouds of genotypes in sequence space. However, numerous barriers to recombination are known. Barriers may be essential such as adaptive incompatibility, or ecological, which is associated with the opportunities for recombination in the natural habitat. Campylobacter jejuni is a gut colonizer of numerous animal species and a major human enteric pathogen. We demonstrate that the two major generalist lineages of C. jejuni do not show evidence of recombination with each other in nature, despite having a high degree of host niche overlap and recombining extensively with specialist lineages. However, transformation experiments show that the generalist lineages readily recombine with one another in vitro. This suggests ecological rather than essential barriers to recombination, caused by a cryptic niche structure within the hosts.
PMCID: PMC4237157  PMID: 24689900
adaptation; Campylobacter; genomics; recombination barriers
18.  TypOn: the microbial typing ontology 
Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.
PMCID: PMC4290098  PMID: 25584183
Ontology; Knowledge representation; Microbial typing methods
19.  Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model 
PLoS Computational Biology  2014;10(8):e1003788.
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
Author Summary
Whole genome sequencing has revolutionised the study of pathogenic microorganisms. It has also become so affordable that hundreds of samples can reasonably be sequenced in an individual project, creating a wealth of data. Estimating the bacterial core genome – traditionally defined as those genes present in all genomes – is an important initial step in population genomics analyses. We developed a simple statistical model to estimate the number of core genes in a bacterial genome dataset, calculated pairwise evolutionary distances (p-distances) based on differences among nucleotide sequences, and plotted the median p-distance for each core gene relative to its genome location. Low p-distance values indicate highly-conserved genes; high values suggest genes under selection and/or undergoing recombination. The genome diagrams depict areas of interest in genomes that can be explored in further detail. Using our method, we analysed five bacterial species comprising a total of 2096 genomes. This revealed new information related to antibiotic resistance and virulence for two bacterial species and demonstrated that the function of many core genes in bacteria is still unknown. Our model provides a highly-accessible, publicly-available tool to use on the vast quantities of genome sequence data now available.
PMCID: PMC4140633  PMID: 25144616
20.  Implications of Differential Age Distribution of Disease-Associated Meningococcal Lineages for Vaccine Development 
New vaccines targeting meningococci expressing serogroup B polysaccharide have been developed, with some being licensed in Europe. Coverage depends on the distribution of disease-associated genotypes, which may vary by age. It is well established that a small number of hyperinvasive lineages account for most disease, and these lineages are associated with particular antigens, including vaccine candidates. A collection of 4,048 representative meningococcal disease isolates from 18 European countries, collected over a 3-year period, were characterized by multilocus sequence typing (MLST). Age data were available for 3,147 isolates. The proportions of hyperinvasive lineages, identified as particular clonal complexes (ccs) by MLST, differed among age groups. Subjects <1 year of age experienced lower risk of sequence type 11 (ST-11) cc, ST-32 cc, and ST-269 cc disease and higher risk of disease due to unassigned STs, 1- to 4-year-olds experienced lower risk of ST-11 cc and ST-32 cc disease, 5- to 14-year-olds were less likely to experience ST-11 cc and ST-269 cc disease, and ≥25-year-olds were more likely to experience disease due to less common ccs and unassigned STs. Younger and older subjects were vulnerable to a more diverse set of genotypes, indicating the more clonal nature of genotypes affecting adolescents and young adults. Knowledge of temporal and spatial diversity and the dynamics of meningococcal populations is essential for disease control by vaccines, as coverage is lineage specific. The nonrandom age distribution of hyperinvasive lineages has consequences for the design and implementation of vaccines, as different variants, or perhaps targets, may be required for different age groups.
PMCID: PMC4054250  PMID: 24695776
21.  Identifying Neisseria Species by Use of the 50S Ribosomal Protein L6 (rplF) Gene 
Journal of Clinical Microbiology  2014;52(5):1375-1381.
The comparison of 16S rRNA gene sequences is widely used to differentiate bacteria; however, this gene can lack resolution among closely related but distinct members of the same genus. This is a problem in clinical situations in those genera, such as Neisseria, where some species are associated with disease while others are not. Here, we identified and validated an alternative genetic target common to all Neisseria species which can be readily sequenced to provide an assay that rapidly and accurately discriminates among members of the genus. Ribosomal multilocus sequence typing (rMLST) using ribosomal protein genes has been shown to unambiguously identify these bacteria. The PubMLST Neisseria database ( was queried to extract the 53 ribosomal protein gene sequences from 44 genomes from diverse species. Phylogenies reconstructed from these genes were examined, and a single 413-bp fragment of the 50S ribosomal protein L6 (rplF) gene was identified which produced a phylogeny that was congruent with the phylogeny reconstructed from concatenated ribosomal protein genes. Primers that enabled the amplification and direct sequencing of the rplF gene fragment were designed to validate the assay in vitro and in silico. Allele sequences were defined for the gene fragment, associated with particular species names, and stored on the PubMLST Neisseria database, providing a curated electronic resource. This approach provides an alternative to 16S rRNA gene sequencing, which can be readily replicated for other organisms for which more resolution is required, and it has potential applications in high-resolution metagenomic studies.
PMCID: PMC3993661  PMID: 24523465
22.  Population structure of the Yersinia pseudotuberculosis complex according to multilocus sequence typing 
Environmental microbiology  2011;13(12):3114-3127.
Multilocus sequence analysis of 417 strains of Yersinia pseudotuberculosis revealed that it is a complex of four populations, three of which have been previously assigned species status [Y. pseudotuberculosis sensu stricto (s.s.), Yersinia pestis and Yersinia similis] and a fourth population, which we refer to as the Korean group, which may be in the process of speciation. We detected clear signs of recombination within Y. pseudotuberculosis s.s. as well as imports from Y. similis and the Korean group. The sources of genetic diversification within Y. pseudotuberculosis s.s. were approximately equally divided between recombination and mutation, whereas recombination has not yet been demonstrated in Y. pestis, which is also much more genetically monomorphic than is Y. pseudotuberculosis s.s. Most Y. pseudotuberculosis s.s. belong to a diffuse group of sequence types lacking clear population structure, although this species contains a melibiose-negative clade that is present globally in domesticated animals. Yersinia similis corresponds to the previously identified Y. pseudotuberculosis genetic type G4, which is probably not pathogenic because it lacks the virulence factors that are typical for Y. pseudotuberculosis s.s. In contrast, Y. pseudotuberculosis s.s., the Korean group and Y. pestis can all cause disease in humans.
PMCID: PMC3988354  PMID: 21951486
23.  MLST revisited: the gene-by-gene approach to bacterial genomics 
Nature reviews. Microbiology  2013;11(10):728-736.
Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria ‘from domain to strain’. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.
PMCID: PMC3980634  PMID: 23979428
24.  Ribosomal proteins as biomarkers for bacterial identification by mass spectrometry in the clinical microbiology laboratory 
Whole-cell matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) is a rapid method for identification of microorganisms that is increasingly used in microbiology laboratories. This identification is based on the comparison of the tested isolate mass spectrum with reference databases. Using Neisseria meningitidis as a model organism, we showed that in one of the available databases, the Andromas database, 10 of the 13 species-specific biomarkers correspond to ribosomal proteins. Remarkably, one biomarker, ribosomal protein L32, was subject to inter-strain variability. The analysis of the ribosomal protein patterns of 100 isolates for which whole genome sequences were available, confirmed the presence of inter-strain variability in the molecular weight of 29 ribosomal proteins, thus establishing a correlation between the sequence type (ST) and/or clonal complex (CC) of each strain and its ribosomal protein pattern. Since the molecular weight of three of the variable ribosomal proteins (L30, L31 and L32) was included in the spectral window observed by MALDI-TOF MS in clinical microbiology, i.e., 3640–12000 m/z, we were able by analyzing the molecular weight of these three ribosomal proteins to classify each strain in one of six subgroups, each of these subgroups corresponding to specific STs and/or CCs. Their detection by MALDI-TOF allows therefore a quick typing of N. meningitidis isolates.
PMCID: PMC3980635  PMID: 23916798
Mass spectrometry; Ribosomal proteins; Biomarkers; Neisseria meningitidis
25.  Automated extraction of typing information for bacterial pathogens from whole genome sequence data: Neisseria meningitidis as an exemplar 
Whole genome sequence (WGS) data are becoming a major means of characterising samples of bacterial pathogens. These data have the advantage of providing detailed information on the genotypes and likely phenotypes of aetiological agents, enabling the relationships of samples from potential disease outbreaks to be established precisely. However, the generation of increasing quantities of sequence data does not, in itself, resolve the problems that a wide variety of microbiological typing methods have addressed over the last 100 years or so; indeed, the provision of very high volumes of unstructured data can confuse rather than resolve these issues. Here we review the nascent field of the storage of WGS data for clinical application and show how curated sequence-based typing schemes on websites such as, accumulated over the past 14 years or so, has generated an infrastructure that can be used to exploit WGS for bacterial typing efficiently. We review the tools that have been implemented within the website to extract clinically useful, strain characterisation information which can be provided to physicians and public health scientists and officials in a timely, concise and understandable way. These data can be used to inform medical decisions such as how to treat a patient, whether to institute public health action, and what action might be appropriate. The information is compatible both with previous sequence-based typing data and also with data that can be obtained in the absence of WGS data, for example by real-time PCR tests, providing a flexible infrastructure for WGS-based clinical microbiology.
PMCID: PMC3977036  PMID: 23369391
Whole genome sequencing; antimicrobial resistance; MLST; antigen typing; meningococcus; epidemiology

Results 1-25 (70)