Search tips
Search criteria

Results 1-25 (55)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Neisseria Adhesin A Variation and Revised Nomenclature Scheme 
Neisseria adhesin A (NadA), involved in the adhesion and invasion of Neisseria meningitidis into host tissues, is one of the major components of Bexsero, a novel multicomponent vaccine licensed for protection against meningococcal serogroup B in Europe, Australia, and Canada. NadA has been identified in approximately 30% of clinical isolates and in a much lower proportion of carrier isolates. Three protein variants were originally identified in invasive meningococci and named NadA-1, NadA-2, and NadA-3, whereas most carrier isolates either lacked the gene or harbored a different variant, NadA-4. Further analysis of isolates belonging to the sequence type 213 (ST-213) clonal complex identified NadA-5, which was structurally similar to NadA-4, but more distantly related to NadA-1, -2, and -3. At the time of this writing, more than 89 distinct nadA allele sequences and 43 distinct peptides have been described. Here, we present a revised nomenclature system, taking into account the complete data set, which is compatible with previous classification schemes and is expandable. The main features of this new scheme include (i) the grouping of the previously named NadA-2 and NadA-3 variants into a single NadA-2/3 variant, (ii) the grouping of the previously assigned NadA-4 and NadA-5 variants into a single NadA-4/5 variant, (iii) the introduction of an additional variant (NadA-6), and (iv) the classification of the variants into two main groups, named groups I and II. To facilitate querying of the sequences and submission of new allele sequences, the nucleotide and amino acid sequences are available at
PMCID: PMC4097447  PMID: 24807056
2.  The domestication of the probiotic bacterium Lactobacillus acidophilus 
Scientific Reports  2014;4:7202.
Lactobacillus acidophilus is a Gram-positive lactic acid bacterium that has had widespread historical use in the dairy industry and more recently as a probiotic. Although L. acidophilus has been designated as safe for human consumption, increasing commercial regulation and clinical demands for probiotic validation has resulted in a need to understand its genetic diversity. By drawing on large, well-characterised collections of lactic acid bacteria, we examined L. acidophilus isolates spanning 92 years and including multiple strains in current commercial use. Analysis of the whole genome sequence data set (34 isolate genomes) demonstrated L. acidophilus was a low diversity, monophyletic species with commercial isolates essentially identical at the sequence level. Our results indicate that commercial use has domesticated L. acidophilus with genetically stable, invariant strains being consumed globally by the human population.
PMCID: PMC4244635  PMID: 25425319
3.  Cryptic ecology among host generalist Campylobacter jejuni in domestic animals 
Molecular Ecology  2014;23(10):2442-2451.
Homologous recombination between bacterial strains is theoretically capable of preventing the separation of daughter clusters, and producing cohesive clouds of genotypes in sequence space. However, numerous barriers to recombination are known. Barriers may be essential such as adaptive incompatibility, or ecological, which is associated with the opportunities for recombination in the natural habitat. Campylobacter jejuni is a gut colonizer of numerous animal species and a major human enteric pathogen. We demonstrate that the two major generalist lineages of C. jejuni do not show evidence of recombination with each other in nature, despite having a high degree of host niche overlap and recombining extensively with specialist lineages. However, transformation experiments show that the generalist lineages readily recombine with one another in vitro. This suggests ecological rather than essential barriers to recombination, caused by a cryptic niche structure within the hosts.
PMCID: PMC4237157  PMID: 24689900
adaptation; Campylobacter; genomics; recombination barriers
4.  Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model 
PLoS Computational Biology  2014;10(8):e1003788.
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
Author Summary
Whole genome sequencing has revolutionised the study of pathogenic microorganisms. It has also become so affordable that hundreds of samples can reasonably be sequenced in an individual project, creating a wealth of data. Estimating the bacterial core genome – traditionally defined as those genes present in all genomes – is an important initial step in population genomics analyses. We developed a simple statistical model to estimate the number of core genes in a bacterial genome dataset, calculated pairwise evolutionary distances (p-distances) based on differences among nucleotide sequences, and plotted the median p-distance for each core gene relative to its genome location. Low p-distance values indicate highly-conserved genes; high values suggest genes under selection and/or undergoing recombination. The genome diagrams depict areas of interest in genomes that can be explored in further detail. Using our method, we analysed five bacterial species comprising a total of 2096 genomes. This revealed new information related to antibiotic resistance and virulence for two bacterial species and demonstrated that the function of many core genes in bacteria is still unknown. Our model provides a highly-accessible, publicly-available tool to use on the vast quantities of genome sequence data now available.
PMCID: PMC4140633  PMID: 25144616
5.  Implications of Differential Age Distribution of Disease-Associated Meningococcal Lineages for Vaccine Development 
New vaccines targeting meningococci expressing serogroup B polysaccharide have been developed, with some being licensed in Europe. Coverage depends on the distribution of disease-associated genotypes, which may vary by age. It is well established that a small number of hyperinvasive lineages account for most disease, and these lineages are associated with particular antigens, including vaccine candidates. A collection of 4,048 representative meningococcal disease isolates from 18 European countries, collected over a 3-year period, were characterized by multilocus sequence typing (MLST). Age data were available for 3,147 isolates. The proportions of hyperinvasive lineages, identified as particular clonal complexes (ccs) by MLST, differed among age groups. Subjects <1 year of age experienced lower risk of sequence type 11 (ST-11) cc, ST-32 cc, and ST-269 cc disease and higher risk of disease due to unassigned STs, 1- to 4-year-olds experienced lower risk of ST-11 cc and ST-32 cc disease, 5- to 14-year-olds were less likely to experience ST-11 cc and ST-269 cc disease, and ≥25-year-olds were more likely to experience disease due to less common ccs and unassigned STs. Younger and older subjects were vulnerable to a more diverse set of genotypes, indicating the more clonal nature of genotypes affecting adolescents and young adults. Knowledge of temporal and spatial diversity and the dynamics of meningococcal populations is essential for disease control by vaccines, as coverage is lineage specific. The nonrandom age distribution of hyperinvasive lineages has consequences for the design and implementation of vaccines, as different variants, or perhaps targets, may be required for different age groups.
PMCID: PMC4054250  PMID: 24695776
6.  Identifying Neisseria Species by Use of the 50S Ribosomal Protein L6 (rplF) Gene 
Journal of Clinical Microbiology  2014;52(5):1375-1381.
The comparison of 16S rRNA gene sequences is widely used to differentiate bacteria; however, this gene can lack resolution among closely related but distinct members of the same genus. This is a problem in clinical situations in those genera, such as Neisseria, where some species are associated with disease while others are not. Here, we identified and validated an alternative genetic target common to all Neisseria species which can be readily sequenced to provide an assay that rapidly and accurately discriminates among members of the genus. Ribosomal multilocus sequence typing (rMLST) using ribosomal protein genes has been shown to unambiguously identify these bacteria. The PubMLST Neisseria database ( was queried to extract the 53 ribosomal protein gene sequences from 44 genomes from diverse species. Phylogenies reconstructed from these genes were examined, and a single 413-bp fragment of the 50S ribosomal protein L6 (rplF) gene was identified which produced a phylogeny that was congruent with the phylogeny reconstructed from concatenated ribosomal protein genes. Primers that enabled the amplification and direct sequencing of the rplF gene fragment were designed to validate the assay in vitro and in silico. Allele sequences were defined for the gene fragment, associated with particular species names, and stored on the PubMLST Neisseria database, providing a curated electronic resource. This approach provides an alternative to 16S rRNA gene sequencing, which can be readily replicated for other organisms for which more resolution is required, and it has potential applications in high-resolution metagenomic studies.
PMCID: PMC3993661  PMID: 24523465
7.  Population structure of the Yersinia pseudotuberculosis complex according to multilocus sequence typing 
Environmental microbiology  2011;13(12):3114-3127.
Multilocus sequence analysis of 417 strains of Yersinia pseudotuberculosis revealed that it is a complex of four populations, three of which have been previously assigned species status [Y. pseudotuberculosis sensu stricto (s.s.), Yersinia pestis and Yersinia similis] and a fourth population, which we refer to as the Korean group, which may be in the process of speciation. We detected clear signs of recombination within Y. pseudotuberculosis s.s. as well as imports from Y. similis and the Korean group. The sources of genetic diversification within Y. pseudotuberculosis s.s. were approximately equally divided between recombination and mutation, whereas recombination has not yet been demonstrated in Y. pestis, which is also much more genetically monomorphic than is Y. pseudotuberculosis s.s. Most Y. pseudotuberculosis s.s. belong to a diffuse group of sequence types lacking clear population structure, although this species contains a melibiose-negative clade that is present globally in domesticated animals. Yersinia similis corresponds to the previously identified Y. pseudotuberculosis genetic type G4, which is probably not pathogenic because it lacks the virulence factors that are typical for Y. pseudotuberculosis s.s. In contrast, Y. pseudotuberculosis s.s., the Korean group and Y. pestis can all cause disease in humans.
PMCID: PMC3988354  PMID: 21951486
8.  MLST revisited: the gene-by-gene approach to bacterial genomics 
Nature reviews. Microbiology  2013;11(10):728-736.
Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria ‘from domain to strain’. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.
PMCID: PMC3980634  PMID: 23979428
9.  Ribosomal proteins as biomarkers for bacterial identification by mass spectrometry in the clinical microbiology laboratory 
Whole-cell matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) is a rapid method for identification of microorganisms that is increasingly used in microbiology laboratories. This identification is based on the comparison of the tested isolate mass spectrum with reference databases. Using Neisseria meningitidis as a model organism, we showed that in one of the available databases, the Andromas database, 10 of the 13 species-specific biomarkers correspond to ribosomal proteins. Remarkably, one biomarker, ribosomal protein L32, was subject to inter-strain variability. The analysis of the ribosomal protein patterns of 100 isolates for which whole genome sequences were available, confirmed the presence of inter-strain variability in the molecular weight of 29 ribosomal proteins, thus establishing a correlation between the sequence type (ST) and/or clonal complex (CC) of each strain and its ribosomal protein pattern. Since the molecular weight of three of the variable ribosomal proteins (L30, L31 and L32) was included in the spectral window observed by MALDI-TOF MS in clinical microbiology, i.e., 3640–12000 m/z, we were able by analyzing the molecular weight of these three ribosomal proteins to classify each strain in one of six subgroups, each of these subgroups corresponding to specific STs and/or CCs. Their detection by MALDI-TOF allows therefore a quick typing of N. meningitidis isolates.
PMCID: PMC3980635  PMID: 23916798
Mass spectrometry; Ribosomal proteins; Biomarkers; Neisseria meningitidis
10.  Automated extraction of typing information for bacterial pathogens from whole genome sequence data: Neisseria meningitidis as an exemplar 
Whole genome sequence (WGS) data are becoming a major means of characterising samples of bacterial pathogens. These data have the advantage of providing detailed information on the genotypes and likely phenotypes of aetiological agents, enabling the relationships of samples from potential disease outbreaks to be established precisely. However, the generation of increasing quantities of sequence data does not, in itself, resolve the problems that a wide variety of microbiological typing methods have addressed over the last 100 years or so; indeed, the provision of very high volumes of unstructured data can confuse rather than resolve these issues. Here we review the nascent field of the storage of WGS data for clinical application and show how curated sequence-based typing schemes on websites such as, accumulated over the past 14 years or so, has generated an infrastructure that can be used to exploit WGS for bacterial typing efficiently. We review the tools that have been implemented within the website to extract clinically useful, strain characterisation information which can be provided to physicians and public health scientists and officials in a timely, concise and understandable way. These data can be used to inform medical decisions such as how to treat a patient, whether to institute public health action, and what action might be appropriate. The information is compatible both with previous sequence-based typing data and also with data that can be obtained in the absence of WGS data, for example by real-time PCR tests, providing a flexible infrastructure for WGS-based clinical microbiology.
PMCID: PMC3977036  PMID: 23369391
Whole genome sequencing; antimicrobial resistance; MLST; antigen typing; meningococcus; epidemiology
11.  Sequence, distribution and chromosomal context of class I and class II pilin genes of Neisseria meningitidis identified in whole genome sequences 
BMC Genomics  2014;15:253.
Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections.
We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species.
Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria.
PMCID: PMC4023411  PMID: 24690385
Type four pilus; Neisseria meningitidis; Class I pilin; Class II pilin; Antigenic variation
12.  A common gene pool for the Neisseria FetA antigen 
Meningococcal FetA is an iron-regulated, immunogenic outer membrane protein and vaccine component. The most diverse region of this protein is a previously defined variable region (VR) that has been shown to be immunodominant. In this analysis, a total of 275 Neisseria lactamica isolates, collected during studies of nasopharyngeal bacterial carriage in infants were examined for the presence of a fetA gene. The fetA VR nucleotide sequence was determined for 217 of these isolates, with fetA apparently absent from 58 isolates, the majority of which belonged to the ST-624 clonal complex. The VR in N. lactamica was compared to the same region in Neisseria meningitidis, Neisseria gonorrhoeae and a number of other commensal Neisseria. Identical fetA variable region sequences were identified among commensal and pathogenic Neisseria, suggesting a common gene pool, differing from other antigens in this respect. Carriage of commensal Neisseria species, such as N. lactamica, that express FetA may be involved in the development of natural immunity to meningococcal disease.
PMCID: PMC3968273  PMID: 18718812
FetA; Neisseria meningitidis; commensal Neisseria; gene pool
13.  A Reference Pan-Genome Approach to Comparative Bacterial Genomics: Identification of Novel Epidemiological Markers in Pathogenic Campylobacter 
PLoS ONE  2014;9(3):e92798.
The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation - focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥70% identity over ≥50% of the locus length - aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.
PMCID: PMC3968026  PMID: 24676150
14.  Recombinational Switching of the Clostridium difficile S-Layer and a Novel Glycosylation Gene Cluster Revealed by Large-Scale Whole-Genome Sequencing 
The Journal of Infectious Diseases  2012;207(4):675-686.
Background. Clostridium difficile is a major cause of nosocomial diarrhea, with 30-day mortality reaching 30%. The cell surface comprises a paracrystalline proteinaceous S-layer encoded by the slpA gene within the cell wall protein (cwp) gene cluster. Our purpose was to understand the diversity and evolution of slpA and nearby genes also encoding immunodominant cell surface antigens.
Methods. Whole-genome sequences were determined for 57 C. difficile isolates representative of the population structure and different clinical phenotypes. Phylogenetic analyses were performed on their genomic region (>63 kb) spanning the cwp cluster.
Results. Genetic diversity across the cwp cluster peaked within slpA, cwp66 (adhesin), and secA2 (secretory translocase). These genes formed a 10-kb cassette, of which 12 divergent variants were found. Homologous recombination involving this cassette caused it to associate randomly with genotype. One cassette contained a novel insertion (length, approximately 24 kb) that resembled S-layer glycosylation gene clusters.
Conclusions. Genetic exchange of S-layer cassettes parallels polysaccharide capsular switching in other species. Both cause major antigenic shifts, while the remainder of the genome is unchanged. C. difficile genotype is therefore not predictive of antigenic type. S-layer switching and immune escape could help explain temporal and geographic variation in C. difficile epidemiology and may inform genotyping and vaccination strategies.
PMCID: PMC3549603  PMID: 23204167
Clostridium difficile; S-layer; S-layer glycosylation; immunodominant antigen; recombination; switching; multilocus sequence type; genotype; evolution
15.  Target Gene Sequencing To Define the Susceptibility of Neisseria meningitidis to Ciprofloxacin 
Meningococcal gyrA gene sequence data, MICs, and mouse infection were used to define the ciprofloxacin breakpoint for Neisseria meningitidis. Residue T91 or D95 of GyrA was altered in all meningococcal isolates with MICs of ≥0.064 μg/ml but not among isolates with MICs of ≤0.032 μg/ml. Experimental infection of ciprofloxacin-treated mice showed slower bacterial clearance when GyrA was altered. These data suggest a MIC of ≥0.064 μg/ml as the ciprofloxacin breakpoint for meningococci and argue for the molecular detection of ciprofloxacin resistance.
PMCID: PMC3623314  PMID: 23357770
16.  Genome sequence analyses show that Neisseria oralis is the same species as ‘Neisseria mucosa var. heidelbergensis’ 
Phylogenies generated from whole genome sequence (WGS) data provide definitive means of bacterial isolate characterization for typing and taxonomy. The species status of strains recently defined with conventional taxonomic approaches as representing Neisseria oralis was examined by the analysis of sequences derived from WGS data, specifically: (i) 53 Neisseria ribosomal protein subunit (rps) genes (ribosomal multi-locus sequence typing, rMLST); and (ii) 246 Neisseria core genes (core genome MLST, cgMLST). These data were compared with phylogenies derived from 16S and 23S rRNA gene sequences, demonstrating that the N. oralis strains were monophyletic with strains described previously as representing ‘Neisseria mucosa var. heidelbergensis’ and that this group was of equivalent taxonomic status to other well-described species of the genus Neisseria. Phylogenetic analyses also indicated that Neisseria sicca and Neisseria macacae should be considered the same species as Neisseria mucosa and that Neisseria flavescens should be considered the same species as Neisseria subflava. Analyses using rMLST showed that some strains currently defined as belonging to the genus Neisseria were more closely related to species belonging to other genera within the family; however, whole genome analysis of a more comprehensive selection of strains from within the family Neisseriaceae would be necessary to confirm this. We suggest that strains previously identified as representing ‘N. mucosa var. heidelbergensis’ and deposited in culture collections should be renamed N. oralis. Finally, one of the strains of N. oralis was able to ferment lactose, due to the presence of β-galactosidase and lactose permease genes, a characteristic previously thought to be unique to Neisseria lactamica, which therefore cannot be thought of as diagnostic for this species; however, the rMLST and cgMLST analyses confirm that N. oralis is most closely related to N. mucosa.
PMCID: PMC3799226  PMID: 24097834
17.  Real-Time Genomic Epidemiological Evaluation of Human Campylobacter Isolates by Use of Whole-Genome Multilocus Sequence Typing 
Journal of Clinical Microbiology  2013;51(8):2526-2534.
Sequence-based typing is essential for understanding the epidemiology of Campylobacter infections, a major worldwide cause of bacterial gastroenteritis. We demonstrate the practical and rapid exploitation of whole-genome sequencing to provide routine definitive characterization of Campylobacter jejuni and Campylobacter coli for clinical and public health purposes. Short-read data from 384 Campylobacter clinical isolates collected over 4 months in Oxford, United Kingdom, were assembled de novo. Contigs were deposited at the website and automatically annotated for 1,667 loci. Typing and phylogenetic information was extracted and comparative analyses were performed for various subsets of loci, up to the level of the whole genome, using the Genome Comparator and Neighbor-net algorithms. The assembled sequences (for 379 isolates) were diverse and resembled collections from previous studies of human campylobacteriosis. Small subsets of very closely related isolates originated mainly from repeated sampling from the same patients and, in one case, likely laboratory contamination. Much of the within-patient variation occurred in phase-variable genes. Clinically and epidemiologically informative data can be extracted from whole-genome sequence data in real time with straightforward, publicly available tools. These analyses are highly scalable, are transparent, do not require closely related genome reference sequences, and provide improved resolution (i) among Campylobacter clonal complexes and (ii) between very closely related isolates. Additionally, these analyses rapidly differentiated unrelated isolates, allowing the detection of single-strain clusters. The approach is widely applicable to analyses of human bacterial pathogens in real time in clinical laboratories, with little specialist training required.
PMCID: PMC3719633  PMID: 23698529
18.  Description and Nomenclature of Neisseria meningitidis Capsule Locus 
Emerging Infectious Diseases  2013;19(4):566-573.
Pathogenic Neisseria meningitidis isolates contain a polysaccharide capsule that is the main virulence determinant for this bacterium. Thirteen capsular polysaccharides have been described, and nuclear magnetic resonance spectroscopy has enabled determination of the structure of capsular polysaccharides responsible for serogroup specificity. Molecular mechanisms involved in N. meningitidis capsule biosynthesis have also been identified, and genes involved in this process and in cell surface translocation are clustered at a single chromosomal locus termed cps. The use of multiple names for some of the genes involved in capsule synthesis, combined with the need for rapid diagnosis of serogroups commonly associated with invasive meningococcal disease, prompted a requirement for a consistent approach to the nomenclature of capsule genes. In this report, a comprehensive description of all N. meningitidis serogroups is provided, along with a proposed nomenclature, which was presented at the 2012 XVIIIth International Pathogenic Neisseria Conference.
PMCID: PMC3647402  PMID: 23628376
Neisseria meningitidis; capsule; serogroup; bacteria; nomenclature
19.  Ribosomal Multi-Locus Sequence Typing: universal characterisation of bacteria from domain to strain 
Microbiology (Reading, England)  2012;158(Pt 4):1005-1015.
No single characterisation scheme currently encompasses all levels of bacterial diversity, from domain to strain. We propose Ribosomal Multi Locus Sequence Typing (rMLST), an approach which indexes variation of the 53 genes encoding the bacterial ribosome protein subunits (rps genes), as a means of integrated microbial taxonomy and typing. As with MLST, rMLST employs curated reference sequences to identify gene variants efficiently and rapidly. The rps loci are ideal targets for a universal characterization scheme as they are: (i) present in all bacteria; (ii) distributed around the chromosome; and (iii) encode proteins which are under stabilising selection for functional conservation. Collectively, the rps loci exhibit variation that resolves bacteria in to groups at all taxonomic and most typing levels providing significantly more resolution than 16S small subunit rRNA gene phylogenies. A web-accessible expandable database, comprising whole genome data from more than 1900 bacterial isolates, including 28 draft genomes assembled de novo from the EBI sequence read archive, has been assembled. The rps gene variation catalogued in this database permits rapid and computationally non-intensive identification of the phylogenetic position of any bacterial sequence at the domain, phylum, class, order, family, genus, species and strain levels. The groupings generated with rMLST data are consistent with current nomenclature schemes and independent of the clustering algorithm used. This approach is applicable to the other domains of life, potentially providing a rational and universal approach to the classification of life that is based on one of its fundamental features, the translation mechanism.
PMCID: PMC3492749  PMID: 22282518
20.  Resolution of a Meningococcal Disease Outbreak from Whole-Genome Sequence Data with Rapid Web-Based Analysis Methods 
Journal of Clinical Microbiology  2012;50(9):3046-3053.
The increase in the capacity and reduction in cost of whole-genome sequencing methods present the imminent prospect of such data being used routinely in real time for investigations of bacterial disease outbreaks. For this to be realized, however, it is necessary that generic, portable, and robust analysis frameworks be available, which can be readily interpreted and used in real time by microbiologists, clinicians, and public health epidemiologists. We have achieved this with a set of analysis tools integrated into the website, which can in principle be used for the analysis of any pathogen. The approach is demonstrated with genomic data from isolates obtained during a well-characterized meningococcal disease outbreak at the University of Southampton, United Kingdom, that occurred in 1997. Whole-genome sequence data were collected, de novo assembled, and deposited into the PubMLST Neisseria BIGSdb database, which automatically annotated the sequences. This enabled the immediate and backwards-compatible classification of the isolates with a number of schemes, including the following: conventional, extended, and ribosomal multilocus sequence typing (MLST, eMLST, and rMLST); antigen gene sequence typing (AGST); analysis based on genes conferring antibiotic susceptibility. The isolates were also compared to a reference isolate belonging to the same clonal complex (ST-11) at 1,975 loci. Visualization of the data with the NeighborNet algorithm, implemented in SplitsTree 4 within the PubMLST website, permitted complete resolution of the outbreak and related isolates, demonstrating that multiple closely related but distinct strains were simultaneously present in asymptomatic carriage and disease, with two causing disease and one responsible for the outbreak itself.
PMCID: PMC3421817  PMID: 22785191
21.  A genomic approach to bacterial taxonomy: an examination and proposed reclassification of species within the genus Neisseria 
Microbiology  2012;158(Pt 6):1570-1580.
In common with other bacterial taxa, members of the genus Neisseria are classified using a range of phenotypic and biochemical approaches, which are not entirely satisfactory in assigning isolates to species groups. Recently, there has been increasing interest in using nucleotide sequences for bacterial typing and taxonomy, but to date, no broadly accepted alternative to conventional methods is available. Here, the taxonomic relationships of 55 representative members of the genus Neisseria have been analysed using whole-genome sequence data. As genetic material belonging to the accessory genome is widely shared among different taxa but not present in all isolates, this analysis indexed nucleotide sequence variation within sets of genes, specifically protein-coding genes that were present and directly comparable in all isolates. Variation in these genes identified seven species groups, which were robust to the choice of genes and phylogenetic clustering methods used. The groupings were largely, but not completely, congruent with current species designations, with some minor changes in nomenclature and the reassignment of a few isolates necessary. In particular, these data showed that isolates classified as Neisseria polysaccharea are polyphyletic and probably include more than one taxonomically distinct organism. The seven groups could be reliably and rapidly generated with sequence variation within the 53 ribosomal protein subunit (rps) genes, further demonstrating that ribosomal multilocus sequence typing (rMLST) is a practicable and powerful means of characterizing bacteria at all levels, from domain to strain.
PMCID: PMC3541776  PMID: 22422752
22.  A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter 
Genes  2012;3(2):261-277.
Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSDB software on the website.
PMCID: PMC3902793  PMID: 24704917
Campylobacter jejuni; Campylobacter coli; campylobacteriosis; whole genome sequencing; next generation sequencing; genome analysis
23.  Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain 
Microbiology  2012;158(Pt 4):1005-1015.
No single genealogical reconstruction or typing method currently encompasses all levels of bacterial diversity, from domain to strain. We propose ribosomal multilocus sequence typing (rMLST), an approach which indexes variation of the 53 genes encoding the bacterial ribosome protein subunits (rps genes), as a means of integrating microbial genealogy and typing. As with multilocus sequence typing (MLST), rMLST employs curated reference sequences to identify gene variants efficiently and rapidly. The rps loci are ideal targets for a universal characterization scheme as they are: (i) present in all bacteria; (ii) distributed around the chromosome; and (iii) encode proteins which are under stabilizing selection for functional conservation. Collectively, the rps loci exhibit variation that resolves bacteria into groups at all taxonomic and most typing levels, providing significantly more resolution than 16S small subunit rRNA gene phylogenies. A web-accessible expandable database, comprising whole-genome data from more than 1900 bacterial isolates, including 28 draft genomes assembled de novo from the European Bioinformatics Institute (EBI) sequence read archive, has been assembled. The rps gene variation catalogued in this database permits rapid and computationally non-intensive identification of the phylogenetic position of any bacterial sequence at the domain, phylum, class, order, family, genus, species and strain levels. The groupings generated with rMLST data are consistent with current nomenclature schemes and independent of the clustering algorithm used. This approach is applicable to the other domains of life, potentially providing a rational and universal approach to the classification of life that is based on one of its fundamental features, the translation mechanism.
PMCID: PMC3492749  PMID: 22282518
24.  Changes in Serogroup and Genotype Prevalence Among Carried Meningococci in the United Kingdom During Vaccine Implementation 
The Journal of Infectious Diseases  2011;204(7):1046-1053.
Background. Herd immunity is important in the effectiveness of conjugate polysaccharide vaccines against encapsulated bacteria. A large multicenter study investigated the effect of meningococcal serogroup C conjugate vaccine introduction on the meningococcal population.
Methods. Carried meningococci in individuals aged 15–19 years attending education establishments were investigated before and for 2 years after vaccine introduction. Isolates were characterized by multilocus sequence typing, serogroup, and capsular region genotype and changes in phenotypes and genotypes assessed.
Results. A total of 8462 meningococci were isolated from 47 765 participants (17.7%). Serogroup prevalence was similar over the 3 years, except for decreases of 80% for serogroup C and 40% for serogroup 29E. Clonal complexes were associated with particular serogroups and their relative proportions fluctuated, with 12 statistically significant changes (6 up, 6 down). The reduction of ST-11 complex serogroup C meningococci was probably due to vaccine introduction. Reasons for a decrease in serogroup 29E ST-254 meningococci (from 1.8% to 0.7%) and an increase in serogroup B ST-213 complex meningococci (from 6.7% to 10.6%) were less clear.
Conclusions. Natural fluctuations in carried meningococcal genotypes and phenotypes a can be affected by the use of conjugate vaccines, and not all of these changes are anticipatable in advance of vaccine introduction.
PMCID: PMC3164428  PMID: 21881120
25.  Clinical Clostridium difficile: Clonality and Pathogenicity Locus Diversity 
PLoS ONE  2011;6(5):e19993.
Clostridium difficile infection (CDI) is an important cause of mortality and morbidity in healthcare settings. The major virulence determinants are large clostridial toxins, toxin A (tcdA) and toxin B (tcdB), encoded within the pathogenicity locus (PaLoc). Isolates vary in pathogenicity from hypervirulent PCR-ribotypes 027 and 078 with high mortality, to benign non-toxigenic strains carried asymptomatically. The relative pathogenicity of most toxigenic genotypes is still unclear, but may be influenced by PaLoc genetic variant. This is the largest study of C. difficile molecular epidemiology performed to date, in which a representative collection of recent isolates (n = 1290) from patients with CDI in Oxfordshire, UK, was genotyped by multilocus sequence typing. The population structure was described using NeighborNet and ClonalFrame. Sequence variation within toxin B (tcdB) and its negative regulator (tcdC), was mapped onto the population structure. The 69 Sequence Types (ST) showed evidence for homologous recombination with an effect on genetic diversification four times lower than mutation. Five previously recognised genetic groups or clades persisted, designated 1 to 5, each having a strikingly congruent association with tcdB and tcdC variants. Hypervirulent ST-11 (078) was the only member of clade 5, which was divergent from the other four clades within the MLST loci. However, it was closely related to the other clades within the tcdB and tcdC loci. ST-11 (078) may represent a divergent formerly non-toxigenic strain that acquired the PaLoc (at least) by genetic recombination. This study focused on human clinical isolates collected from a single geographic location, to achieve a uniquely high density of sampling. It sets a baseline of MLST data for future comparative studies investigating genotype virulence potential (using clinical severity data for these isolates), possible reservoirs of human CDI, and the evolutionary origins of hypervirulent strains.
PMCID: PMC3098275  PMID: 21625511

Results 1-25 (55)