Bacterial species comprise related genotypes that can display divergent phenotypes with important clinical implications. Staphylococcus epidermidis is a common cause of nosocomial infections and, critical to its pathogenesis, is its ability to adhere and form biofilms on surfaces, thereby moderating the effect of the host’s immune response and antibiotics. Commensal S. epidermidis populations are thought to differ from those associated with disease in factors involved in adhesion and biofilm accumulation. We quantified the differences in biofilm formation in 98 S. epidermidis isolates from various sources, and investigated population structure based on ribosomal multilocus typing (rMLST) and the presence/absence of genes involved in adhesion and biofilm formation. All isolates were able to adhere and form biofilms in in vitro growth assays and confocal microscopy allowed classification into 5 biofilm morphotypes based on their thickness, biovolume and roughness. Phylogenetic reconstruction grouped isolates into three separate clades, with the isolates in the main disease associated clade displaying diversity in morphotype. Of the biofilm morphology characteristics, only biofilm thickness had a significant association with clade distribution. The distribution of some known adhesion-associated genes (aap and sesE) among isolates showed a significant association with the species clonal frame. These data challenge the assumption that biofilm-associated genes, such as those on the ica operon, are genetic markers for less invasive S. epidermidis isolates, and suggest that phenotypic characteristics, such as adhesion and biofilm formation, are not fixed by clonal descent but are influenced by the presence of various genes that are mobile among lineages.
Neisseria meningitidis is a leading cause of meningitis and septicaemia. The hyperinvasive ST-11 clonal complex (cc11) caused serogroup C (MenC) outbreaks in the US military in the 1960s and UK universities in the 1990s, a global Hajj-associated serogroup W (MenW) outbreak in 2000–2001, and subsequent MenW epidemics in sub-Saharan Africa. More recently, endemic MenW disease has expanded in South Africa, South America and the UK, and MenC cases have been reported among European and North American men who have sex with men (MSM). Routine typing schemes poorly resolve cc11 so we established the population structure at genomic resolution.
Representatives of these episodes and other geo-temporally diverse cc11 meningococci (n = 750) were compared across 1546 core genes and visualised on phylogenetic networks.
MenW isolates were confined to a distal portion of one of two main lineages with MenB and MenC isolates interspersed elsewhere. An expanding South American/UK MenW strain was distinct from the ‘Hajj outbreak’ strain and a closely related endemic South African strain. Recent MenC isolates from MSM in France and the UK were closely related but distinct.
High resolution ‘genomic’ multilocus sequence typing is necessary to resolve and monitor the spread of diverse cc11 lineages globally.
•The meningococcal ST-11 clonal complex is diverse.•A ‘South American’ serogroup W strain is currently expanding in the UK.•Supports decision to vaccinate UK teenagers against meningococcal serogroup W.•A distinct endemic South African strain is related to the 2000 Hajj outbreak strain.•Serogroup C cases among MSM in UK and France are related but distinct.
Meningococcal; ST-11 clonal complex; Genome; Serogroup W; Serogroup C
The pneumococcus is a leading pathogen infecting children and adults. Safe, effective vaccines exist, and they work by inducing antibodies to the polysaccharide capsule (unique for each serotype) that surrounds the cell; however, current vaccines are limited by the fact that only a few of the nearly 100 antigenically distinct serotypes are included in the formulations. Within the serotypes, serogroup 6 pneumococci are a frequent cause of serious disease and common colonizers of the nasopharynx in children. Serotype 6E was first reported in 2004 but was thought to be rare; however, we and others have detected serotype 6E among recent pneumococcal collections. Therefore, we analyzed a diverse data set of ∼1,000 serogroup 6 genomes, assessed the prevalence and distribution of serotype 6E, analyzed the genetic diversity among serogroup 6 pneumococci, and investigated whether pneumococcal conjugate vaccine-induced serotype 6A and 6B antibodies mediate the killing of serotype 6E pneumococci. We found that 43% of all genomes were of serotype 6E, and they were recovered worldwide from healthy children and patients of all ages with pneumococcal disease. Four genetic lineages, three of which were multidrug resistant, described ∼90% of the serotype 6E pneumococci. Serological assays demonstrated that vaccine-induced serotype 6B antibodies were able to elicit killing of serotype 6E pneumococci. We also revealed three major genetic clusters of serotype 6A capsular sequences, discovered a new hybrid 6C/6E serotype, and identified 44 examples of serotype switching. Therefore, while vaccines appear to offer protection against serotype 6E, genetic variants may reduce vaccine efficacy in the longer term because of the emergence of serotypes that can evade vaccine-induced immunity.
Invasive meningococcal disease (IMD) caused by Neisseria meningitidis serogroup Y has increased in Europe, especially in Scandinavia. In Sweden, serogroup Y is now the dominating serogroup, and in 2012, the serogroup Y disease incidence was 0.46/100,000 population. We previously showed that a strain type belonging to sequence type 23 was responsible for the increased prevalence of this serogroup in Sweden. The objective of this study was to investigate the serogroup Y emergence by whole-genome sequencing and compare the meningococcal population structure of Swedish invasive serogroup Y strains to those of other countries with different IMD incidence. Whole-genome sequencing was performed on invasive serogroup Y isolates from 1995 to 2012 in Sweden (n = 186). These isolates were compared to a collection of serogroup Y isolates from England, Wales, and Northern Ireland from 2010 to 2012 (n = 143), which had relatively low serogroup Y incidence, and two isolates obtained in 1999 in the United States, where serogroup Y remains one of the major causes of IMD. The meningococcal population structures were similar in the investigated regions; however, different strain types were prevalent in each geographic region. A number of genes known or hypothesized to have an impact on meningococcal virulence were shown to be associated with different strain types and subtypes. The reasons for the IMD increase are multifactorial and are influenced by increased virulence, host adaptive immunity, and transmission. Future genome-wide association studies are needed to reveal additional genes associated with serogroup Y meningococcal disease, and this work would benefit from a complete serogroup Y meningococcal reference genome.
The opportunistic pathogens Staphylococcus aureus and Staphylococcus epidermidis represent major causes of severe nosocomial infection, and are associated with high levels of mortality and morbidity worldwide. These species are both common commensals on the human skin and in the nasal pharynx, but are genetically distinct, differing at 24% average nucleotide divergence in 1,478 core genes. To better understand the genome dynamics of these ecologically similar staphylococcal species, we carried out a comparative analysis of 324 S. aureus and S. epidermidis genomes, including 83 novel S. epidermidis sequences. A reference pan-genome approach and whole genome multilocus-sequence typing revealed that around half of the genome was shared between the species. Based on a BratNextGen analysis, homologous recombination was found to have impacted on 40% of the core genes in S. epidermidis, but on only 24% of the core genes in S. aureus. Homologous recombination between the species is rare, with a maximum of nine gene alleles shared between any two S. epidermidis and S. aureus isolates. In contrast, there was considerable interspecies admixture of mobile elements, in particular genes associated with the SaPIn1 pathogenicity island, metal detoxification, and the methicillin-resistance island SCCmec. Our data and analysis provide a context for considering the nature of recombinational boundaries between S. aureus and S. epidermidis and, the selective forces that influence realized recombination between these species.
Staphylococcus; evolution; ecology; recombination; nosocomial infections
Neisseria adhesin A (NadA), involved in the adhesion and invasion of Neisseria meningitidis into host tissues, is one of the major components of Bexsero, a novel multicomponent vaccine licensed for protection against meningococcal serogroup B in Europe, Australia, and Canada. NadA has been identified in approximately 30% of clinical isolates and in a much lower proportion of carrier isolates. Three protein variants were originally identified in invasive meningococci and named NadA-1, NadA-2, and NadA-3, whereas most carrier isolates either lacked the gene or harbored a different variant, NadA-4. Further analysis of isolates belonging to the sequence type 213 (ST-213) clonal complex identified NadA-5, which was structurally similar to NadA-4, but more distantly related to NadA-1, -2, and -3. At the time of this writing, more than 89 distinct nadA allele sequences and 43 distinct peptides have been described. Here, we present a revised nomenclature system, taking into account the complete data set, which is compatible with previous classification schemes and is expandable. The main features of this new scheme include (i) the grouping of the previously named NadA-2 and NadA-3 variants into a single NadA-2/3 variant, (ii) the grouping of the previously assigned NadA-4 and NadA-5 variants into a single NadA-4/5 variant, (iii) the introduction of an additional variant (NadA-6), and (iv) the classification of the variants into two main groups, named groups I and II. To facilitate querying of the sequences and submission of new allele sequences, the nucleotide and amino acid sequences are available at http://pubmlst.org/neisseria/NadA/.
Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.
The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.
The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1138) contains supplementary material, which is available to authorized users.
Neisseria meningitidis; de novo assembly; BIGSdb; Gene-by-gene analysis; cgMLST; rMLST; rST; Bacterial population genomics
Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections
Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains.
The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes ‘on the fly’, and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1121) contains supplementary material, which is available to authorized users.
Emergent bacterial pathogen; Cronobacter; MLST; Genomic analysis
Lactobacillus acidophilus is a Gram-positive lactic acid bacterium that has had widespread historical use in the dairy industry and more recently as a probiotic. Although L. acidophilus has been designated as safe for human consumption, increasing commercial regulation and clinical demands for probiotic validation has resulted in a need to understand its genetic diversity. By drawing on large, well-characterised collections of lactic acid bacteria, we examined L. acidophilus isolates spanning 92 years and including multiple strains in current commercial use. Analysis of the whole genome sequence data set (34 isolate genomes) demonstrated L. acidophilus was a low diversity, monophyletic species with commercial isolates essentially identical at the sequence level. Our results indicate that commercial use has domesticated L. acidophilus with genetically stable, invariant strains being consumed globally by the human population.
Homologous recombination between bacterial strains is theoretically capable of preventing the separation of daughter clusters, and producing cohesive clouds of genotypes in sequence space. However, numerous barriers to recombination are known. Barriers may be essential such as adaptive incompatibility, or ecological, which is associated with the opportunities for recombination in the natural habitat. Campylobacter jejuni is a gut colonizer of numerous animal species and a major human enteric pathogen. We demonstrate that the two major generalist lineages of C. jejuni do not show evidence of recombination with each other in nature, despite having a high degree of host niche overlap and recombining extensively with specialist lineages. However, transformation experiments show that the generalist lineages readily recombine with one another in vitro. This suggests ecological rather than essential barriers to recombination, caused by a cryptic niche structure within the hosts.
adaptation; Campylobacter; genomics; recombination barriers
Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.
Ontology; Knowledge representation; Microbial typing methods
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
Whole genome sequencing has revolutionised the study of pathogenic microorganisms. It has also become so affordable that hundreds of samples can reasonably be sequenced in an individual project, creating a wealth of data. Estimating the bacterial core genome – traditionally defined as those genes present in all genomes – is an important initial step in population genomics analyses. We developed a simple statistical model to estimate the number of core genes in a bacterial genome dataset, calculated pairwise evolutionary distances (p-distances) based on differences among nucleotide sequences, and plotted the median p-distance for each core gene relative to its genome location. Low p-distance values indicate highly-conserved genes; high values suggest genes under selection and/or undergoing recombination. The genome diagrams depict areas of interest in genomes that can be explored in further detail. Using our method, we analysed five bacterial species comprising a total of 2096 genomes. This revealed new information related to antibiotic resistance and virulence for two bacterial species and demonstrated that the function of many core genes in bacteria is still unknown. Our model provides a highly-accessible, publicly-available tool to use on the vast quantities of genome sequence data now available.
New vaccines targeting meningococci expressing serogroup B polysaccharide have been developed, with some being licensed in Europe. Coverage depends on the distribution of disease-associated genotypes, which may vary by age. It is well established that a small number of hyperinvasive lineages account for most disease, and these lineages are associated with particular antigens, including vaccine candidates. A collection of 4,048 representative meningococcal disease isolates from 18 European countries, collected over a 3-year period, were characterized by multilocus sequence typing (MLST). Age data were available for 3,147 isolates. The proportions of hyperinvasive lineages, identified as particular clonal complexes (ccs) by MLST, differed among age groups. Subjects <1 year of age experienced lower risk of sequence type 11 (ST-11) cc, ST-32 cc, and ST-269 cc disease and higher risk of disease due to unassigned STs, 1- to 4-year-olds experienced lower risk of ST-11 cc and ST-32 cc disease, 5- to 14-year-olds were less likely to experience ST-11 cc and ST-269 cc disease, and ≥25-year-olds were more likely to experience disease due to less common ccs and unassigned STs. Younger and older subjects were vulnerable to a more diverse set of genotypes, indicating the more clonal nature of genotypes affecting adolescents and young adults. Knowledge of temporal and spatial diversity and the dynamics of meningococcal populations is essential for disease control by vaccines, as coverage is lineage specific. The nonrandom age distribution of hyperinvasive lineages has consequences for the design and implementation of vaccines, as different variants, or perhaps targets, may be required for different age groups.
The comparison of 16S rRNA gene sequences is widely used to differentiate bacteria; however, this gene can lack resolution among closely related but distinct members of the same genus. This is a problem in clinical situations in those genera, such as Neisseria, where some species are associated with disease while others are not. Here, we identified and validated an alternative genetic target common to all Neisseria species which can be readily sequenced to provide an assay that rapidly and accurately discriminates among members of the genus. Ribosomal multilocus sequence typing (rMLST) using ribosomal protein genes has been shown to unambiguously identify these bacteria. The PubMLST Neisseria database (http://pubmlst.org/neisseria/) was queried to extract the 53 ribosomal protein gene sequences from 44 genomes from diverse species. Phylogenies reconstructed from these genes were examined, and a single 413-bp fragment of the 50S ribosomal protein L6 (rplF) gene was identified which produced a phylogeny that was congruent with the phylogeny reconstructed from concatenated ribosomal protein genes. Primers that enabled the amplification and direct sequencing of the rplF gene fragment were designed to validate the assay in vitro and in silico. Allele sequences were defined for the gene fragment, associated with particular species names, and stored on the PubMLST Neisseria database, providing a curated electronic resource. This approach provides an alternative to 16S rRNA gene sequencing, which can be readily replicated for other organisms for which more resolution is required, and it has potential applications in high-resolution metagenomic studies.
Multilocus sequence analysis of 417 strains of Yersinia pseudotuberculosis revealed that it is a complex of four populations, three of which have been previously assigned species status [Y. pseudotuberculosis sensu stricto (s.s.), Yersinia pestis and Yersinia similis] and a fourth population, which we refer to as the Korean group, which may be in the process of speciation. We detected clear signs of recombination within Y. pseudotuberculosis s.s. as well as imports from Y. similis and the Korean group. The sources of genetic diversification within Y. pseudotuberculosis s.s. were approximately equally divided between recombination and mutation, whereas recombination has not yet been demonstrated in Y. pestis, which is also much more genetically monomorphic than is Y. pseudotuberculosis s.s. Most Y. pseudotuberculosis s.s. belong to a diffuse group of sequence types lacking clear population structure, although this species contains a melibiose-negative clade that is present globally in domesticated animals. Yersinia similis corresponds to the previously identified Y. pseudotuberculosis genetic type G4, which is probably not pathogenic because it lacks the virulence factors that are typical for Y. pseudotuberculosis s.s. In contrast, Y. pseudotuberculosis s.s., the Korean group and Y. pestis can all cause disease in humans.
Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria ‘from domain to strain’. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.
Whole-cell matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) is a rapid method for identification of microorganisms that is increasingly used in microbiology laboratories. This identification is based on the comparison of the tested isolate mass spectrum with reference databases. Using Neisseria meningitidis as a model organism, we showed that in one of the available databases, the Andromas database, 10 of the 13 species-specific biomarkers correspond to ribosomal proteins. Remarkably, one biomarker, ribosomal protein L32, was subject to inter-strain variability. The analysis of the ribosomal protein patterns of 100 isolates for which whole genome sequences were available, confirmed the presence of inter-strain variability in the molecular weight of 29 ribosomal proteins, thus establishing a correlation between the sequence type (ST) and/or clonal complex (CC) of each strain and its ribosomal protein pattern. Since the molecular weight of three of the variable ribosomal proteins (L30, L31 and L32) was included in the spectral window observed by MALDI-TOF MS in clinical microbiology, i.e., 3640–12000 m/z, we were able by analyzing the molecular weight of these three ribosomal proteins to classify each strain in one of six subgroups, each of these subgroups corresponding to specific STs and/or CCs. Their detection by MALDI-TOF allows therefore a quick typing of N. meningitidis isolates.
Mass spectrometry; Ribosomal proteins; Biomarkers; Neisseria meningitidis
Whole genome sequence (WGS) data are becoming a major means of characterising samples of bacterial pathogens. These data have the advantage of providing detailed information on the genotypes and likely phenotypes of aetiological agents, enabling the relationships of samples from potential disease outbreaks to be established precisely. However, the generation of increasing quantities of sequence data does not, in itself, resolve the problems that a wide variety of microbiological typing methods have addressed over the last 100 years or so; indeed, the provision of very high volumes of unstructured data can confuse rather than resolve these issues. Here we review the nascent field of the storage of WGS data for clinical application and show how curated sequence-based typing schemes on websites such as PubMLST.org, accumulated over the past 14 years or so, has generated an infrastructure that can be used to exploit WGS for bacterial typing efficiently. We review the tools that have been implemented within the PubMLST.org website to extract clinically useful, strain characterisation information which can be provided to physicians and public health scientists and officials in a timely, concise and understandable way. These data can be used to inform medical decisions such as how to treat a patient, whether to institute public health action, and what action might be appropriate. The information is compatible both with previous sequence-based typing data and also with data that can be obtained in the absence of WGS data, for example by real-time PCR tests, providing a flexible infrastructure for WGS-based clinical microbiology.
Whole genome sequencing; antimicrobial resistance; MLST; antigen typing; meningococcus; epidemiology
Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections.
We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species.
Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria.
Type four pilus; Neisseria meningitidis; Class I pilin; Class II pilin; Antigenic variation
Meningococcal FetA is an iron-regulated, immunogenic outer membrane protein and vaccine component. The most diverse region of this protein is a previously defined variable region (VR) that has been shown to be immunodominant. In this analysis, a total of 275 Neisseria lactamica isolates, collected during studies of nasopharyngeal bacterial carriage in infants were examined for the presence of a fetA gene. The fetA VR nucleotide sequence was determined for 217 of these isolates, with fetA apparently absent from 58 isolates, the majority of which belonged to the ST-624 clonal complex. The VR in N. lactamica was compared to the same region in Neisseria meningitidis, Neisseria gonorrhoeae and a number of other commensal Neisseria. Identical fetA variable region sequences were identified among commensal and pathogenic Neisseria, suggesting a common gene pool, differing from other antigens in this respect. Carriage of commensal Neisseria species, such as N. lactamica, that express FetA may be involved in the development of natural immunity to meningococcal disease.
FetA; Neisseria meningitidis; commensal Neisseria; gene pool
The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation - focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥70% identity over ≥50% of the locus length - aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.
Background. Clostridium difficile is a major cause of nosocomial diarrhea, with 30-day mortality reaching 30%. The cell surface comprises a paracrystalline proteinaceous S-layer encoded by the slpA gene within the cell wall protein (cwp) gene cluster. Our purpose was to understand the diversity and evolution of slpA and nearby genes also encoding immunodominant cell surface antigens.
Methods. Whole-genome sequences were determined for 57 C. difficile isolates representative of the population structure and different clinical phenotypes. Phylogenetic analyses were performed on their genomic region (>63 kb) spanning the cwp cluster.
Results. Genetic diversity across the cwp cluster peaked within slpA, cwp66 (adhesin), and secA2 (secretory translocase). These genes formed a 10-kb cassette, of which 12 divergent variants were found. Homologous recombination involving this cassette caused it to associate randomly with genotype. One cassette contained a novel insertion (length, approximately 24 kb) that resembled S-layer glycosylation gene clusters.
Conclusions. Genetic exchange of S-layer cassettes parallels polysaccharide capsular switching in other species. Both cause major antigenic shifts, while the remainder of the genome is unchanged. C. difficile genotype is therefore not predictive of antigenic type. S-layer switching and immune escape could help explain temporal and geographic variation in C. difficile epidemiology and may inform genotyping and vaccination strategies.
Clostridium difficile; S-layer; S-layer glycosylation; immunodominant antigen; recombination; switching; multilocus sequence type; genotype; evolution
Meningococcal gyrA gene sequence data, MICs, and mouse infection were used to define the ciprofloxacin breakpoint for Neisseria meningitidis. Residue T91 or D95 of GyrA was altered in all meningococcal isolates with MICs of ≥0.064 μg/ml but not among isolates with MICs of ≤0.032 μg/ml. Experimental infection of ciprofloxacin-treated mice showed slower bacterial clearance when GyrA was altered. These data suggest a MIC of ≥0.064 μg/ml as the ciprofloxacin breakpoint for meningococci and argue for the molecular detection of ciprofloxacin resistance.
Phylogenies generated from whole genome sequence (WGS) data provide definitive means of bacterial isolate characterization for typing and taxonomy. The species status of strains recently defined with conventional taxonomic approaches as representing Neisseria oralis was examined by the analysis of sequences derived from WGS data, specifically: (i) 53 Neisseria ribosomal protein subunit (rps) genes (ribosomal multi-locus sequence typing, rMLST); and (ii) 246 Neisseria core genes (core genome MLST, cgMLST). These data were compared with phylogenies derived from 16S and 23S rRNA gene sequences, demonstrating that the N. oralis strains were monophyletic with strains described previously as representing ‘Neisseria mucosa var. heidelbergensis’ and that this group was of equivalent taxonomic status to other well-described species of the genus Neisseria. Phylogenetic analyses also indicated that Neisseria sicca and Neisseria macacae should be considered the same species as Neisseria mucosa and that Neisseria flavescens should be considered the same species as Neisseria subflava. Analyses using rMLST showed that some strains currently defined as belonging to the genus Neisseria were more closely related to species belonging to other genera within the family; however, whole genome analysis of a more comprehensive selection of strains from within the family Neisseriaceae would be necessary to confirm this. We suggest that strains previously identified as representing ‘N. mucosa var. heidelbergensis’ and deposited in culture collections should be renamed N. oralis. Finally, one of the strains of N. oralis was able to ferment lactose, due to the presence of β-galactosidase and lactose permease genes, a characteristic previously thought to be unique to Neisseria lactamica, which therefore cannot be thought of as diagnostic for this species; however, the rMLST and cgMLST analyses confirm that N. oralis is most closely related to N. mucosa.
Sequence-based typing is essential for understanding the epidemiology of Campylobacter infections, a major worldwide cause of bacterial gastroenteritis. We demonstrate the practical and rapid exploitation of whole-genome sequencing to provide routine definitive characterization of Campylobacter jejuni and Campylobacter coli for clinical and public health purposes. Short-read data from 384 Campylobacter clinical isolates collected over 4 months in Oxford, United Kingdom, were assembled de novo. Contigs were deposited at the pubMLST.org/campylobacter website and automatically annotated for 1,667 loci. Typing and phylogenetic information was extracted and comparative analyses were performed for various subsets of loci, up to the level of the whole genome, using the Genome Comparator and Neighbor-net algorithms. The assembled sequences (for 379 isolates) were diverse and resembled collections from previous studies of human campylobacteriosis. Small subsets of very closely related isolates originated mainly from repeated sampling from the same patients and, in one case, likely laboratory contamination. Much of the within-patient variation occurred in phase-variable genes. Clinically and epidemiologically informative data can be extracted from whole-genome sequence data in real time with straightforward, publicly available tools. These analyses are highly scalable, are transparent, do not require closely related genome reference sequences, and provide improved resolution (i) among Campylobacter clonal complexes and (ii) between very closely related isolates. Additionally, these analyses rapidly differentiated unrelated isolates, allowing the detection of single-strain clusters. The approach is widely applicable to analyses of human bacterial pathogens in real time in clinical laboratories, with little specialist training required.