|Home | About | Journals | Submit | Contact Us | Français|
Despite high pertussis vaccine coverage, reported cases of whooping cough (pertussis) have increased over the last decade in the United States and other developed countries. Although Bordetella pertussis is well known for its limited gene sequence variation, recent advances in long-read sequencing technology have begun to reveal genomic structural heterogeneity among otherwise indistinguishable isolates, even within geographically or temporally defined epidemics. We have compared rearrangements among complete genome assemblies from 257 B. pertussis isolates to examine the potential evolution of the chromosomal structure in a pathogen with minimal gene nucleotide sequence diversity. Discrete changes in gene order were identified that differentiated genomes from vaccine reference strains and clinical isolates of various genotypes, frequently along phylogenetic boundaries defined by single nucleotide polymorphisms. The observed rearrangements were primarily large inversions centered on the replication origin or terminus and flanked by IS481, a mobile genetic element with >240 copies per genome and previously suspected to mediate rearrangements and deletions by homologous recombination. These data illustrate that structural genome evolution in B. pertussis is not limited to reduction but also includes rearrangement. Therefore, although genomes of clinical isolates are structurally diverse, specific changes in gene order are conserved, perhaps due to positive selection, providing novel information for investigating disease resurgence and molecular epidemiology.
IMPORTANCE Whooping cough, primarily caused by Bordetella pertussis, has resurged in the United States even though the coverage with pertussis-containing vaccines remains high. The rise in reported cases has included increased disease rates among all vaccinated age groups, provoking questions about the pathogen's evolution. The chromosome of B. pertussis includes a large number of repetitive mobile genetic elements that obstruct genome analysis. However, these mobile elements facilitate large rearrangements that alter the order and orientation of essential protein-encoding genes, which otherwise exhibit little nucleotide sequence diversity. By comparing the complete genome assemblies from 257 isolates, we show that specific rearrangements have been conserved throughout recent evolutionary history, perhaps by eliciting changes in gene expression, which may also provide useful information for molecular epidemiology.
Bordetella pertussis is the causative agent of whooping cough (pertussis), a respiratory disease with the highest morbidity and mortality in young infants. The introduction of vaccines against pertussis during the 1940s dramatically reduced the disease incidence in the United States. However, despite high or increasing coverage with pertussis-containing vaccines, the number of reported pertussis cases in the United States and many other developed countries has increased over the last decade, with notable recent epidemics (1,–3). Multiple factors likely contribute to increased disease reporting, including heightened awareness, expanded surveillance, improved laboratory diagnostic testing, and allelic mismatch between circulating and vaccine reference strains (2, 4, 5). Waning protection conferred by acellular vaccine formulations, which replaced whole-cell preparations in the United States during the 1990s, has also led to increased disease rates among vaccinated individuals (2, 6,–8).
Often cited for its limited genetic variability, B. pertussis has earned a reputation as a monomorphic pathogen (9, 10). Previous genomic studies have identified only a few mutations within a limited number of genes, each of which have quickly spread throughout the circulating population with little evidence of geographic restriction (5, 11). These mutations have occurred in genes encoding immunogenic proteins, most notably those for antigens in currently used acellular pertussis vaccines, such as genes for pertussis toxin (ptxA and the promoter region ptxP) and fimbriae (fimH), leading many to conclude that they result from vaccine-driven selection (5, 12,–15). Although sequence diversity has also been observed in the vaccine immunogen pertactin (Prn), the circulating isolates recovered in the United States have become predominantly Prn deficient in recent years by one of at least 16 different mutations to the prn gene (16). The global emergence of Prn deficiency is well documented (16,–19), and deficiency may confer a fitness advantage during infection (20, 21). While Prn deficiency appears to be more prevalent in isolates recovered from fully vaccinated patients, it does not appear to impact the effectiveness of acellular pertussis vaccines (22, 23).
Such limited genetic variation appears contradictory to the restricted ecology of B. pertussis as an obligate human pathogen, where ongoing adaptation is expected for its continued survival in the host population. Low-resolution genomic approaches have previously identified chromosomal diversity among B. pertussis isolates within epidemics by marker gene mapping (24), hybridization (25), and pulsed-field gel electrophoresis (PFGE) (1). A genome sequence-level characterization of structural variation has only recently become possible through advances in long-read sequencing technology capable of spanning the many insertion sequence (IS) elements present in B. pertussis permitting circular assembly. While initial comparisons of complete genome assemblies have revealed considerable rearrangement plasticity among B. pertussis isolates, these studies have been limited to small sample sizes due to high sequencing costs (9, 26,–28).
For these reasons, B. pertussis presents unique obstacles to comparative genomics and sequence-based molecular epidemiology that are not shared by many other bacterial pathogens. To address these challenges, we have reconstructed the rearrangement history of the genetic content in circulating B. pertussis by analyzing the complete genome assemblies from 257 isolates with varied chromosomal structures. The results from this study provide evidence of conservation within the observed patterns of rearrangement, some of which correlate with previously described allelic replacements, such as the switch from ptxP1 to ptxP3. Furthermore, the distribution of structures within a single nucleotide polymorphism (SNP) phylogeny uncovered specific inversions, as well as the local arrangement of functional genes nearby, that underlie the PFGE profiles predominant within the circulating population. These results broaden the understanding of ongoing B. pertussis evolution to include specific gene order changes.
The current study compared 257 complete circular genome assemblies from isolates of B. pertussis, which captured the full complement of gene content and nucleotide sequence information (see Table S1 in the supplemental material). An alignment revealed considerable variation in genomic structures (Fig. 1). The assemblies presented here, most of which came from clinical isolates recovered in the United States over the past decade, exhibited 62 discrete genomic structures (Fig. 1B). Two hundred forty-seven of these genomes were recovered from isolates of the ptxP3 lineage and included 53 unique structures. These structures largely correlated with PFGE patterns such that genomes from isolates with the same PFGE profiles were frequently colinear. The abundances of the observed structures varied (Fig. 1C), and the most common were CDC237 (n = 59) and CDC002 (n = 42) (named according to the associated PFGE profile), reflecting their prevalences in the circulating population. Thirty-five structures were uniquely present in only one genome (“singletons”), suggesting that structural diversity remains undersampled. Likewise, the genomes of isolates from the ptxP1 lineage and the vaccine reference strains (10536 [B203], CS [C393], and Tohama I [E476 and J169]) all exhibited different structures (Fig. 1B). No differences in genomic structure were observed by restriction digest optical mapping or by genome sequencing following 11 serial passages of a clinical isolate. Although the genomes exhibited considerable rearrangement plasticity, structures appeared stable during routine laboratory manipulations and likely did not change between clinical isolation and genome sequencing.
The observed rearrangements within genomes of different structures were primarily in the form of large inversions and were frequently flanked by insertions of IS481, a mobile genetic element with >240 copies per genome. All three copies of the rRNA operon were also observed flanking rearrangements. Minor variants of abundant structures were detected in smaller groups of isolates (e.g., CDC046-2) (Fig. 1C) and among singletons. For example, the genomes of some isolates with PFGE profile CDC010 differed by small unique inversions (see Fig. S1). Other minor variants resulted from small deletions, which averaged 5 kb, and were the only source of gene content variation detected among the ptxP3 genomes. These results are consistent with the prevailing view that B. pertussis gene content is relatively static and is occasionally altered through IS-mediated erosion but not horizontal acquisition.
To illustrate the relationships between observed genomic structures, a subset representing each of the 62 unique structures was aligned and then clustered on the basis of relative differences in the order and orientation of shared homologous sequence blocks representing >94% of each genome. The resulting tree topology (Fig. 2) was largely concordant with a hierarchical clustering of PFGE profiles on the basis of shared genome fragment sizes (see Fig. S2). As with PFGE, the clustering according to global genomic structures discretely separated isolates of the ptxP1 and ptxP3 lineages from vaccine reference strains and from each other. Phylogenetic divergence of the ptxP1 and ptxP3 lineages is well documented (5), and these results suggest that the two groups are distinguishable by conserved genomic structural features, not just by nucleotide sequence polymorphisms. Despite the considerable structural heterogeneity observed among the 247 ptxP3 genomes, all were much more similar to each other than to the widely used reference strain Tohama I (E476).
The genome sequences of additional isolates, most recovered in Europe, were completed recently and are available in public databases (26, 27, 29, 30). An alignment of these 17 genomes with a representative subset of the data here revealed that their structures were largely different (see Fig. S3 and S4). Specifically, only two recent European isolates, B1865 (GenBank accession no. CP011441) and B3621 (accession no. CP011401), were colinear with the CDC013 structure. The structural differences between these data sets may simply reflect undersampling, particularly given the distribution of abundances observed within the data here (Fig. 1C).
SNPs were predicted throughout the core genome for investigating the phylogenetic distribution of genomic structures. All IS elements, rRNA operons, and the highly variable prn gene were explicitly masked to minimize the influence of known sites of recombination and homoplasy. A total of 1,473 variable core positions were identified, representing 0.036% of the average B. pertussis genome, and were used to reconstruct the phylogeny with maximum parsimony (Fig. 3). The isolates separated into well-supported clades according to their ptxP and fimH (fim3) alleles, consistent with SNP phylogenies reported elsewhere (5, 11). Certain genomic structures appeared phylogenetically restricted, which is to say that strains with colinear genomes were also closely related according to their SNP patterns. For example, structures associated with PFGE profiles CDC217, CDC237, CDC242, and CDC253 each resided within a single clade in the phylogenetic tree (Fig. 3). Whether isolates with a common genomic structure were distributed across relatively small (e.g., CDC242) or large (e.g., CDC237) geographic distances, they shared an observable ancestral heritage.
Other genomic structures did not coalesce within the phylogeny but instead comingled with other similar structures. Genomic structures represented by PFGE profiles CDC002 and CDC010 were intermingled, as were CDC013 and CDC046. Each of these pairs was confined to the SNP background defined by either fimH1 (CDC002 and CDC010) or fimH2 (CDC013 and CDC046). The distribution of structures within the phylogeny suggests that, despite the observed heterogeneity and possibly variable rearrangement rates, global genomic structures are surprisingly stable and thus potentially traceable.
The phylogeny of the genomes analyzed was reconstructed while excluding the nucleotide sequence of the highly variable gene prn, which encodes an immunogenic protein found in most current acellular vaccine formulations. Clinical isolates recovered in the United States have become increasingly Prn deficient in recent years by a variety of mutations to prn, including missense substitutions, insertions, deletions, and promoter disruption, but most frequently through IS481 insertion at one of three positions (16). Most mutations appeared to be restricted within the phylogeny of genomes here, with the exception of the IS481 insertions, even when closely related strains differed in genomic structure (Fig. 3). Homoplastic insertions likely resulted from independent events, and the observation of IS481 insertion in both forward and reverse orientations at positions 1,613 and 2,735 provides direct evidence of this. These results suggest that most prn-disrupting mutations occurred once in the circulating population but that prn-disrupting IS481 insertions have occurred repeatedly at the same positions.
The phylogenetic distribution of genomic structures exposed pairs that appeared intermingled, namely, CDC002/CDC010 and CDC013/CDC046, within the SNP backgrounds defined by fimH1 and fimH2, respectively (Fig. 3). An alignment of representatives with these structures revealed that each pair differed by a common single inversion (Fig. 4A). The boundaries of this inversion contained a three-gene inverted repeat including IS481 and encoding a hypothetical protein and a predicted major facilitator superfamily (MFS) membrane protein. Nearby genes encoded proteins with various functions, including leucine biosynthesis and transport, fatty acid biosynthesis, organic acid transport, diguanylate cyclase activity, protein stability, and siderophore biosynthesis and transport (see Data Set S1). Structures CDC237 and CDC300 also differed by the same inversion (Fig. 4A), suggesting that this rearrangement had occurred repeatedly, and perhaps reversibly, within the ptxP3 lineage.
The PFGE profile CDC237 was associated with the most common genomic structure in the data presented here (Fig. 1C). A phylogenetic reconstruction indicated that these 59 genomes shared a clonal SNP background and conserved IS481 disruption of prn at position 1613 (Fig. 3). The placement of the CDC237 clade shed light on its potential emergence from similar structures present in closely related isolates (Fig. 3), and a clear path of sequential inversion was inferred by an alignment of representative genome sequences (Fig. 4B). Whole-genome SNP patterns and structures together suggested that CDC237 descended from the progressive rearrangement of CDC002 to CDC010 to CDC217 to CDC237. The inversion from CDC217 to CDC237 produced novel gene order arrangements not observed elsewhere in the data set (Fig. 5). Annotation of nearby genes revealed the operon and dedicated regulator for the biosynthesis of the excreted polysaccharide Bps (Fig. 5; see also Data Set S1). Genomes in the clade containing CDC237 and CDC300 also included specific SNPs present in a few genes encoding central metabolic proteins (Data Set S1).
The clustering of genomes according to global structure discretely separated vaccine reference strains and ptxP1 and ptxP3 isolates, indicating that these groups had diverged in genomic structure (Fig. 2). Filtering of the structural differences observed in an alignment of representative genomes revealed a limited number of discrete deletions, inversions, and rearrangement boundaries specific to certain lineages (Table 1). Three conserved deletions were observed, including two in all clinical isolate genomes relative to vaccine reference strains and one specific to ptxP3 isolates only (Table 1; see also Data Set S2). Three inversions, all flanked by IS481 insertions, were also detected; one was present in all vaccine references, another was present in all ptxP3 genomes, and one differentiated ptxP3 fimH1 and ptxP3 fimH2 genomes (Table 1; see also Data Set S2). The vaccine- and ptxP3-specific inversions were asymmetric with respect to the predicted replication origin and terminus. Three individual rearrangement boundaries were specific to ptxP3 genomes, including two that were present and one that was absent compared with vaccine reference and ptxP1 genomes, and the local arrangement of genes at these boundaries differed between lineages (Table 1; see also Data Set S2). No specific local structures were identified among genomes of nonvaccine ptxP1 isolates or Prn-deficient isolates. These results indicated that structural features have become fixed throughout the phylogenetic history of B. pertussis and that such loci may contribute toward adaptive evolution.
Conserved insertion sites and IS element content variability were further tracked in two colinear groups, the phylogenetically linked CDC237 (n = 59) and phylogenetically diverse CDC002 (n = 42). In both groups, the majority of IS481 insertions (>90%) were present at conserved sites in all of the genomes, including among isolates recovered as much as 14 years apart (Fig. S5). In at least one genome, 23% of these conserved sites contained multiple neighboring (“duplicated”) insertions, in which a second or third IS481 had inserted adjacent to an existing copy, sharing a 6-bp target sequence (Fig. S5). Individual genomes in each group had an average of 17 sites with multiple insertions (range, 13 to 21). Variable insertion sites, i.e., those that lacked IS481 in one or more genomes, were also observed within both groups, but only three such insertions on average were present in each genome (range, 1 to 6). No changes in IS481 content were observed in either of two parallel replicates of a clinical isolate following serial passage. No variations in IS1002 (n = 5) or IS1663 (n = 16) content were observed in these two sets of colinear genomes. Therefore, IS481 copy number did not equal the number of unique loci disrupted by insertion, and the observed variation resulted primarily from multiple insertions at conserved sites rather than differential insertions at unique sites. The distribution patterns of some variable insertions within CDC002 genomes were phylogenetically linked but did not suggest that IS481 copy number had increased over time in this group (Fig. S5).
IS481 is inserted at specific 6-bp target sequences, which become duplicated, flanking the element upon insertion (31). Forty-two unique sequences were observed flanking IS481 insertions, and the target sequence motif was calculated from the seven most common sequences in six phylogenetically disparate genomes (see Fig. S6). Using the resulting motif, all 257 genomes were queried for 6-bp target sequences to identify putative unoccupied sites susceptible to insertion. The genomes contained an average of 249 unoccupied sites (range, 242 to 255), suggesting that IS481 copies currently occupy approximately half of all possible insertion sites. On average, 146 available sites (range, 139 to 150) fell within predicted coding regions, with some genes containing multiple sites, and the remaining 103 (range, 99 to 107) were within intergenic regions. The 142 unique genes harboring IS481 target sequences encoded predicted proteins of various functions (Fig. S6; Data Set S3). The availability of these sites for future insertions implies that these genes either present opportunities for further genome erosion or are essential for B. pertussis growth.
In this study, we introduced a new comprehensive picture of B. pertussis genome variation through a comparative analysis of complete assemblies from 257 isolates. Just as the first closed sequence of Tohama I revealed much about B. pertussis speciation from a B. bronchiseptica-like ancestor (32), the data here further revise the understanding of B. pertussis evolution. Previous studies have indicated that the recent genetic history of B. pertussis has been punctuated by the successive accumulation of small mutations (5, 11). Advances in sequencing and optical mapping technologies make complete assemblies more accessible, and the results here reveal evidence of conservation in genomic structural changes, which may result from selective pressure in favor of specific gene arrangements.
Bacterial chromosome organization is not random but rather appears to favor a conserved core gene order, at least within species or lineages (33). Such organization is heavily influenced by bidirectional replication, which imprints strand biases for gene distribution (33, 34) and coordinates gene expression (35,–37), imposing constraints on rearrangement. Comparative genomics across many species has suggested that selection operates to maintain replichore balance, such that rearrangements in the form of symmetric inversions centered on the replication origin or terminus appear to be a common feature of bacterial genome evolution (34, 38, 39). Indeed, many of the rearrangements reported here were symmetric inversions, and large uninterrupted regions were frequently observed, consistent with this understanding.
The observed linkage between SNPs and rearrangements in the population makes it challenging to infer which are under positive selection and which, if any, are hitchhiking. Although the ptxP3 allele is thought to increase pertussis toxin expression (40, 41), a recent comparison of isogenic mutants found that the ptxP3 allele and the genetic background, which includes genomic structure, independently enhance colonization in mice (42). Similarly, although alleles of fimH contain polymorphisms in the predicted surface epitope region, differences in immune recognition remain untested (43). If selective forces govern genomic architecture, then some rearrangements likely carry fitness effects, including possible benefits. Rearrangement-mediated adaptation has been observed during laboratory evolution experiments, including recombination between IS elements (44, 45). Perhaps recently observed population sweeps have actually been driven by selection in favor of genomic structure, not nucleotide sequence. Inversions specific to either isolates with the ptxP3 allele or vaccine reference strains appeared within a single replichore (i.e., they were asymmetric), suggesting that these rearrangements confer beneficial effects sufficiently large to offset the costs associated with disrupting coding strand bias. Further evidence of selection for gene order may exist in temporal fluctuations in PFGE profiles (Fig. 6B), a proxy for genomic structure, which generally occur in the absence of significant gene sequence variation in B. pertussis. However, until experiments can appropriately compare isogenic strains with varied genomic structures, the adaptive nature of B. pertussis rearrangement remains speculative.
A lack of geographical SNP clustering in the global B. pertussis population was reported previously (5, 11), and similarly, no such trends in genome structural variation were observed in the data here (Fig. 6). PFGE profile CDC237 has been the most abundant profile among U.S. clinical isolates since 2012 (46) and was reported for 55% of those received at the CDC in 2015 (Fig. 6B). Although isolates sequenced here with this profile were collected across 13 disparate states from 2010 to 2014, a phylogenetic reconstruction indicated that this group had a clonal origin. The data here highlighted a possible path of three successive inversions toward the emergence of CDC237, which produced novel local gene order conformations. Annotation of nearby genes revealed the operon and dedicated regulator for the biosynthesis of Bps, an excreted polysaccharide necessary for biofilm formation (47) that also confers resistance to complement-mediated killing (48). How this unique gene reorganization might impact B. pertussis fitness or virulence by altering Bps production remains unanswered, but the recent prevalence of CDC237 hints that this genotype has some putative advantage.
Contextualizing the genomic structural relatedness within the SNP phylogeny also revealed an identical inversion that transpired in the divergent fimH1 and fimH2 backgrounds, independently. Structures did not appear monophyletic within either the fimH1 or fimH2 background, suggesting that the inversion has occurred multiple times or perhaps reversibly in each. The switch from CDC237 to CDC300 also occurred by an inversion at the same boundaries in the fimH1 background, providing further evidence for reversibility. The biosynthesis and transport of alcaligin, an important siderophore produced by B. pertussis and B. bronchiseptica (49), were encoded nearby. Whether this inversion modulates alcaligin production or whether the observed parallelism reflects positive selection rather than neutral polymorphism has yet to be explored.
In contrast to that in many other bacterial pathogens, the genetic content in B. pertussis is largely invariable, even between epidemic and nonepidemic isolates (14). Although no evidence of gene gain has been reported, many previous studies have identified regions of difference resulting from apparent deletions conserved among circulating isolates in comparison with older references (12, 14, 50, 51). The conserved structural features identified here, specifically those which differentiated isolates with the ptxP3 allele and vaccine reference strains, included deletions flanked by IS481 insertions, each of which corresponds to regions of difference identified previously (12, 14, 25, 50,–53). However, conserved rearrangements and inversions were also identified, illustrating that structural genome evolution in B. pertussis is not limited to reduction.
These new data expand the understanding of how extensively IS481 contributes to B. pertussis genome evolution, beyond previously reported gene disruption and recombination-mediated deletion. They also permit accurate accounting of content variation between related genomes, which has uncovered fluctuations in IS481 insertion numbers at conserved sites. Such neighboring insertions differ across short and long phylogenetic distances within the ptxP3 lineage and appear to be the primary determinants of total IS481 copy number variation between closely related isolates, rather than insertions at unique sites. The mechanism and effects of these fluctuations, perhaps by modulating downstream gene expression (54, 55), require further study as their detection is only now possible with such complete genomic data. Many coding regions harboring predicted, but unoccupied, IS481 insertion sites were conserved across phylogenetically disparate isolates. These loci may encode the proteins necessary for survival and thus might be useful targets for improved molecular typing or vaccine immunogens.
The structural plasticity reported here portrays a dynamic circulating population, replete with genome rearrangement as a source of variation for natural selection, independent of mutations to primary gene sequences, and further illustrates the dramatic divergence of circulating isolates away from common reference strains (56). How gene order influences fitness or virulence remains unanswered, and phenotypic comparisons between defined genomic structures are under way. Even if structural conservation simply reflects population bottlenecks experienced during each selective sweep, the data here clearly demonstrate signs of stability and suggest that limits to rearrangement or rearrangement rate exist within B. pertussis. Although the utility of genome structure for molecular epidemiology requires further exploration, application of this knowledge to routine surveillance is particularly challenging, as short-read benchtop sequencers, which are becoming common in public health laboratories, are currently incapable of detecting such diversity. The new depth of measurable variation among circulating B. pertussis isolates revealed here will strengthen investigations of recent disease resurgence.
U.S. B. pertussis isolates for sequencing were selected by various strategies from the CDC collection, including isolates collected through surveillance and outbreaks, and assembled genomes were included in the current study on the basis of their availability. One set was selected to capture potential regional diversity among 28 states by maximizing the variety of source states for isolates collected from 2000 to 2013, with an emphasis on those collected from 2010 to 2013 (n = 80). Another selection focused on isolates obtained through the Enhanced Pertussis Surveillance/Emerging Infection Program Network (57) in seven states from 2011 to 2014, which prioritized isolates from hospitalized cases and forced sampling of every possible combination of year, state, vaccination status, and age group (n = 130). Additionally, eight isolates selected prospectively from sporadic submission in 2014, 31 epidemic isolates sequenced previously (28), and eight others, such as vaccine reference strains, were also included.
PFGE was performed using restriction enzyme XbaI according to the method developed by Gautom (58). PFGE patterns were compared with those in a database of B. pertussis isolate profiles maintained at the CDC, and profiles were assigned on the basis of bands in the 125- to 450-kb range using BioNumerics v5.01 (Applied Maths, Austin, TX).
Isolates were cultured on Regan-Lowe agar without cephalexin for 72 h at 37°C. Genomic DNA (gDNA) isolation and purification were performed using the Gentra Puregene yeast/bacteria kit (Qiagen, Valencia, CA) with slight modification. Briefly, two aliquots of approximately 109 bacterial cells were harvested and resuspended in 500 μl of 0.85% sterile saline and then pelleted by centrifugation for 1 min at 16,000 × g. Recovered genomic DNA was resuspended in 100 μl of DNA hydration solution. Aliquots were quantified using a Nanodrop 2000 (Thermo Fisher Scientific, Inc., Wilmington, DE).
Whole-genome shotgun sequencing of isolates was performed using a combination of the PacBio RSII (Pacific Biosciences, Menlo Park, CA), Illumina HiSeq/MiSeq (Illumina, San Diego, CA), and Argus (OpGen, Gaithersburg, MA) platforms as described previously (28). Briefly, genomic DNA libraries were prepared for PacBio sequencing using SMRTbell template prep kit 1.0 and polymerase binding kit P4, while Illumina libraries were prepared using the NEB Ultra library prep kit (New England BioLabs, Ipswich, MA). De novo assembly was performed using the Hierarchical Genome Assembly Process (HGAP v3; Pacific Biosciences) (59). The resulting consensus sequences were manually checked for circularity using gepard (v1.30) (60) and then reordered to match the start of Tohama I (GenBank accession no. NC_002929). Assemblies were confirmed by comparison with restriction digest optical maps using the Argus system (OpGen) with MapSolver (v.2.1.1; OpGen) and further polished by mapping Illumina reads using the CLC genomics workbench (v8.5; CLC bio, Boston, MA). Assemblies were annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP).
Genomes were aligned in all pairwise combinations using progressiveMauve with default settings (61) and were determined to be colinear if alignments included no observable inversions or gaps of >1,500 bp. A nonredundant subset of genomes representing all unique structures was aligned using progressiveMauve with optimized parameters (seed-weight, 16; hmm-identity, 0.85). Permutation matrices of homologous sequence blocks of >1,500 bp were calculated from progressiveMauve output files with custom Perl scripts and then used to cluster genomes on the basis of structural similarity with the Maximum Likelihood for Gene Order (MLGO) pipeline (62). Phylogenetic reconstruction was calculated using kSNP3 (63), with k of 23, after masking the pertactin gene (prn) and all IS-element sequences with N's. A maximum parsimony tree was calculated from core variable positions in kSNP3. For all trees, internal nodes with <50% bootstrap support were collapsed into multifurcations in Archaeopteryx v0.9901 (64) and annotation was performed with iTOL v3.0 (65). Alleles for common molecular typing loci (ptxP, ptxA, ptxB, fimH, and prn) were assigned by a high-stringency BLASTn alignment to assembled genomes. Mutations to prn were assigned using a custom curated database.
Conserved structural changes were identified by searching the distribution of shared adjacencies from MLGO using Fisher's exact test with Benjamini-Hochberg correction for multiple testing in a subset of representative genomic structures, excluding all singleton ptxP3 genomes. The identified features were validated by a manual inspection of progressiveMauve and BLASTn alignments. The predicted protein sequences encoded by genes near specific rearrangement sites were functionally classified according to a betaproteobacterium-specific subset of EggNOG v4.1 (66) using HMMER v3.1b2 (http://hmmer.org) and further annotated by a manual query of Swiss-Prot (67) and the Conserved Domain Database (68) using DELTA-BLAST (69) via the NCBI web interface.
A single glycerol stock bead of clinical isolate H866 was suspended in 100 μl saline and spread on Regan-Lowe agar without cephalexin at serial dilutions down to 10−6. A single colony (ancestor) was recovered from the most dilute plate, was suspended in 50 μl saline, and was spread onto a new plate. After incubation, bacterial growth was recovered with a cotton swab and suspended in saline. A small volume of the suspension was spread on a new plate and the remainder was used for gDNA extraction as described above. After 11 passages, multiple colonies were recovered for optical mapping and gDNA was extracted from the remaining culture for genome sequencing.
Positions of IS481 (GenBank accession no. M22031) insertions were identified by a BLASTn query of assembled genomes and then matched across colinear genomes according to exhaustive pairwise progressiveMauve alignments using the parameters described above in combination with custom Perl scripts. Insertions were classified as either “conserved” if their positions and orientations were shared in all genomes or “variable” if they were absent in one or more genomes. Sites of multiple insertion were identified if neighboring insertions were separated only by their 6-bp target sequence and were in the same orientation.
The insertion target consensus sequence was determined from the 6 bp flanking all IS481 insertions in six phylogenetically disparate strains (B203, E150, E476, E976, I344, and J090). A motif was calculated from the seven most abundant sequences in each, a total of 2,314 sequences representing >77% of sites in each genome, using MEME (v4.10.2) with a 5-order Markov frequency background model (70). Putative IS481 insertion target sequences were predicted with FIMO (v4.10.2) (71), followed by subtraction of sites with known insertions identified by BLASTn alignment. The predicted protein sequences from genes containing target sites were functionally classified as described above.
The source code for custom scripts developed in the present study is available at https://github.com/mikeyweigand/Pertussis_n257.
The whole-genome shotgun sequences have been deposited at DDBJ/EMBL/GenBank under accession numbers CP011167 to CP011208, CP011234 to CP011244, CP011255, CP011687 to CP011768, CP012078 to CP012089, CP012129 to CP012134, CP013075 to CP013096, CP013863 to CP013866, CP013868 to CP013907, and CP013951 (see Table S1 in the supplemental material). The versions described in this paper are the first versions. Raw sequence data are available from the NCBI Sequence Read Archive, organized under a BioProject with accession number PRJNA279196.
We thank Leonard Mayer, Conrad Quinn, Stacy Martin, Tami Skoff, and Connie Lam (CDC) for insightful discussions that contributed to the development of this work, Christine Miner (CDC) for isolate metadata management, and the Enhanced Pertussis Surveillance/Emerging Infection Program Network sites and other state health departments for contributing isolates.
This work was supported by internal funds.
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Supplemental material for this article may be found at https://doi.org/10.1128/JB.00806-16.