|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Proteolytic Clostridium botulinum is the causative agent of botulism, a severe neuroparalytic illness. Given the severity of botulism, surprisingly little is known of the population structure, biology, phylogeny or evolution of C. botulinum. The recent determination of the genome sequence of C. botulinum has allowed comparative genomic indexing using a DNA microarray.
Whole genome microarray analysis revealed that 63% of the coding sequences (CDSs) present in reference strain ATCC 3502 were common to all 61 widely-representative strains of proteolytic C. botulinum and the closely related C. sporogenes tested. This indicates a relatively stable genome. There was, however, evidence for recombination and genetic exchange, in particular within the neurotoxin gene and cluster (including transfer of neurotoxin genes to C. sporogenes), and the flagellar glycosylation island (FGI). These two loci appear to have evolved independently from each other, and from the remainder of the genetic complement. A number of strains were atypical; for example, while 10 out of 14 strains that formed type A1 toxin gave almost identical profiles in whole genome, neurotoxin cluster and FGI analyses, the other four strains showed divergent properties. Furthermore, a new neurotoxin sub-type (A5) has been discovered in strains from heroin-associated wound botulism cases. For the first time, differences in glycosylation profiles of the flagella could be linked to differences in the gene content of the FGI.
Proteolytic C. botulinum has a stable genome backbone containing specific regions of genetic heterogeneity. These include the neurotoxin gene cluster and the FGI, each having evolved independently of each other and the remainder of the genetic complement. Analysis of these genetic components provides a high degree of discrimination of strains of proteolytic C. botulinum, and is suitable for clinical and forensic investigations of botulism outbreaks.
The species Clostridium botulinum consists of a group of four physiologically and phylogenetically distinct Gram-positive obligately anaerobic bacteria that share the common feature of producing the highly potent botulinum neurotoxin . Organisms belonging to two of these groups are associated with the majority of cases of human botulism. C. botulinum Group I (proteolytic C. botulinum) is a mesophilic organism that is responsible for foodborne botulism, wound botulism, adult intestinal botulism and infant botulism. C. sporogenes is considered to be a non-toxigenic version of proteolytic C. botulinum . C. botulinum Group II (non-proteolytic C. botulinum) is a psychrotrophic organism associated with most cases of foodborne botulism not attributed to Group I [3,4]. The botulinum neurotoxins are the most potent toxins known, with as little as 30–100 ng constituting a potentially fatal dose , and are considered to be a bioterrorism threat .
Seven major types of botulinum neurotoxin (types A to G), and a significant number of sub-types have been described. For example, four sub-types of type A toxin (termed A1, A2, A3, A4) have been identified [7-9]. Sub-types are defined as differing by at least 2.6% at the amino acid level [7,10]. Proteolytic C. botulinum strains form neurotoxin of types A, B, or F, and dual-toxin forming strains have also been described . Additionally, some strains possess two neurotoxin genes, but only form one active neurotoxin. For example, A(B) strains possess a type A and type B neurotoxin gene, but only form type A neurotoxin. Non-proteolytic C. botulinum strains form a single neurotoxin of types B, E, or F. Each neurotoxin protein comprises a light chain and heavy chain. The light chains possess endopeptidase activity and cleave proteins in the SNARE complex leading to flaccid muscle paralysis, and potentially respiratory failure .
The neurotoxin genes are associated with other genes within the neurotoxin cluster, and two major cluster types are recognised. The most studied neurotoxin cluster in proteolytic C. botulinum is termed the ha plus/orf-X minus cluster. It is commonly associated with type A1 and type B neurotoxin genes [9,12,13], and is present in the genome of the sequenced type A1 strain ATCC 3502 used as a hybridisation reference in this work . This cluster comprises genes for the neurotoxin (cntA), three haemagglutinins (HA) (cntC, cntD, cntE), non-toxic-non-haemagglutinin (NTNH) (cntB), and a positive regulatory protein (cntR). The second cluster type is called the ha minus/orf-X plus cluster. In the case of proteolytic C. botulinum, this cluster is most frequently associated with type A2, A3, A4 and F toxin genes, and the type A1 gene in A(B) strains [9,12,13]. This cluster includes genes for the neurotoxin, NTNH and CntR (historically also known as p21 [9,13]), lacks the three genes encoding HA, and additionally contains a group of three open reading frames (orf-X1, orf-X2, orf-X3) and a single CDS (coding sequence) (p47) all of unknown function.
The genome sequence of proteolytic C. botulinum strain ATCC 3502 (NCTC 13319, Hall 174) has been recently completed, and consists of a chromosome (3.9 Mbp) and plasmid (16.3 kbp), which contain 3,650 and 19 coding sequences (CDSs), respectively . A DNA microarray was designed based on this sequence, and initial tests revealed that two prophages and a plasmid present in the genome of strain ATCC 3502 were absent from 11 test strains of proteolytic C. botulinum and C. sporogenes, and that the DNA microarray could be used to discriminate between strains of proteolytic C. botulinum . The 11 test strains shared a minimum of 84% of the CDSs of ATCC 3502, but were significantly diverged from other sequenced clostridial species, demonstrating the wide phylogenetic distance between different clostridia .
The flagellar glycosylation island (FGI) also showed evidence of diversity between strains of proteolytic C. botulinum . The ATCC 3502 genome contains a large putative FGI comprising CDSs CBO2666-2729. These are flanked by the CDSs for flagellar structural proteins FlgB (CBO2665), FliD and the flagellin structural subunits FlaA1 (CBO2730) and FlaA2 (CBO2731). The FGI can be divided into two distinct regions . CBO2678-2689 are CDSs similar to those involved in capsular polysaccharide biosynthesis in Group B Streptococcus (designated FGI-I, flanked by putative flagellin structural genes CBO 2666 and CBO 2695), whereas CBO2696-2729 represent CDSs with sequence similarity to those involved in the modification of Campylobacter jejuni flagellins with nonulosonic acids (designated FGI-II, CBO2696-CBO2729) .
In order to extend our understanding of phylogenetic relationships and the biology of proteolytic C. botulinum, an extensive comparative genomic indexing study has been carried out involving 58 strains of proteolytic C. botulinum, 2 strains of non-proteolytic C. botulinum, and 3 strains of C. sporogenes using DNA microarrays based on the genome sequence of strain ATCC 3502. We have assessed the evolution of the neurotoxin gene and cluster and flagellar glycosylation island (FGI) in relation to the remainder of the genetic complement. We have also identified important links between CDSs contained within the FGI and sugars associated with post-translational modification of flagella, and discovered a new neurotoxin A sub-type associated with UK wound botulism cases.
C. botulinum and C. sporogenes strains used in this work, together with the type of neurotoxin formed, their origin, source and date of isolation are listed in Table Table1.1. Before use, all strains were checked for purity (consistent colony morphology) and lack of contamination by growth on PYGS plates under both aerobic and anaerobic atmospheres . Proteolytic activity was determined by growth on Reinforced Clostridial Medium (RCM) containing 5% (w/v) skim milk  and lipase activity on McClung Toabe egg yolk medium . Strains were also checked for presence of type A, B and F neurotoxin genes by PCR using 100 ng genomic DNA as template with primer pairs NKB-1 (5'-GATACATTTACAAATCCTGAAGGAGA-3') and NKB-5 (5'-AACCGTTTAACACCATAAGGGATCATAGAA-3') which generate a 2278 bp PCR product for the type A neurotoxin gene; B-1A (5'-GATGGAACCATTTGCTAG-3') and B2-D (5'-AACATCAATACATATTCCTGG-3') which generate a 1284 bp PCR product for the type B neurotoxin gene ; and BONTFF2 (5'-GTGCTTATTATGATCCTAATTATTTAACC-3') and BONTFR2 (5'-CCATACTTCCATTGAAAATAATCTTTATA-3') which, using the same reaction conditions, give a 765 bp PCR product for the type F neurotoxin gene (data not shown). The type(s) of neurotoxins formed by each strain was established by sero-neutralisation and the mouse bioassay [19,20].
Genomic DNA was purified from exponentially growing cells, digested with Sau3A1 and labelled with fluorescent nucleotides as previously described  except that Cy5- or Cy3-dUTP (GE Healthcare, UK) was substituted for Cy5- or Cy3-dCTP. The isolation of plasmid DNA followed the method outlined by O'Sullivan and Klaenhammer . For restriction enzyme analysis, the manufacturer protocols (New England BioLabs, USA) were followed with the addition of spermidine to a final concentration of 4 mM. Digests were analyzed by standard gel electrophoresis in 1.5% agarose.
The microarray  included probes for 3433 genomic CDS, for 19 CDSs of plasmid pBOT3502, and for neurotoxin-associated genes absent in strain ATCC 3502. These included probes for CDSs of the gene clusters commonly associated with neurotoxins type A2, A3, A4 and F (i.e. p47, orf-X1, orf-X2, orf-X3 and lycA); for cntR of type F; and two probes each (N- and C-terminus probes) for cntA (neurotoxin CDS) of types A, B, and F. Probes were designed using GenBank database sequences (available in July 2005), and PCR amplified using DNA from strains NCTC 9837 (p47, orf-X1, orf-X2, orf-X3 and lycA), Langeland (cntR/F, cntA/F-N and cntA/F-C), ATCC 3502 (cntA/A-N and cntA/A-C), and NCTC 3815 (cntA/B-N and cntA/B-C). The microarray probe to the haemagglutinin gene cntC (CBO0803; previously ha34) failed to give a satisfactory signal and these microarray data were excluded from the analysis. The overall extent of genome coverage is similar to that achieved previously for other DNA microarrays [22-24]. GenBank accession numbers for genomes used in this work are: A1 strain ATCC 3502, AM412317; ATCC 3502 plasmid pBOT3502, AM412318; A1 strain ATCC 19397, CP000726; A3 strain Loch Maree, CP000962; F strain Langeland, CP000728; A1(B) strain NCTC 2916, NZ_ABDO00000000 (genome in progress); strain Bf, NZ_ABDP00000000 (genome in progress); C. sporogenes strain ATCC 15579, NZ_ABKW00000000 (genome in progress). The microarray design has been deposited with ArrayExpress (accession number A-MEXP-791).
Each experiment combined 2 μg Cy5-dUTP-labelled ATCC 3502 (reference) DNA and 2 μg Cy3-dUTP-labelled test DNA, and was performed on a minimum of four probe set replicates as described previously . DNA microarrays were scanned using an Axon GenePix 4000B microarray laser scanner (Axon Instruments, CA, USA). The data from detected features was initially processed using the GenePix Pro v.6.0 software supplied with the scanner.
The R package arrayMagic v.1.14.0  was used to assess the quality of the hybridisations by generating a diagnostic plot showing the pairwise similarities between all hybridisations. The pairwise similarity score (Sab) was calculated by arrayMagic via Sab = MADi(Xia - Xib), where for each pair of arrays (a and b) Xia is the log-ratio of the i-th probe on the a-th array, and the MAD (median absolute deviation) is taken over all CDSs. The hierarchical clustering diagram generated used the similarity scores as a measure of the 'distance' between arrays. In this way the fidelity of the microarray technical replicates could be assessed (arrayMagic's R script, experiment description file and diagnostic plot are available on request). The data for replicates that did not group together were discarded and the hybridisation experiments repeated with a fresh preparation of genomic DNA. Array data were further analysed using the GeneSpring GX package (Agilent Technologies) using Lowess normalisation. In order to correct for uneven printing or for probes which routinely gave a high or low signal, data were further normalised by using as a control hybridisation data from ATCC 3502 × ATCC 3502 dye-swap experiments (four microarrays) on a per CDS basis.
Pearson Correlation coefficients were calculated for the normalised signal ratios associated with probes for all chromosomal and pBOT3502 CDSs and used to create a similarity matrix for all 61 strains of proteolytic C. botulinum and C. sporogenes. The similarity matrix was subjected to the average linkage clustering method using GeneSpring GX software.
The data generated by probes for neurotoxin cluster genes not found in ATCC 3502 were processed separately as there was no competing reference DNA during hybridisation. Whereas a signal channel ratio of 0.55 was taken as the cut-off between a presence or absence of hybridisation for chromosomal genes, a ratio greater than 5.0 was taken as a positive hybridisation for CDSs not in ATCC 3502. This gave results that agreed well with known genome sequences of C. botulinum in the GenBank database. Data for probes to the ATCC 3502 neurotoxin gene cluster itself used a cut-off point of 0.40 to compensate for the fact that all hybridisations had been performed using ATCC 3502 DNA as the reference material.
The microarray data were validated for biological significance using CDSs within the clostridial flagellar glycosylation island (FGI) and plasmid pBOT3502. The DNA sequence of 28 CDSs from the FGI-I (Figures (Figures11 and and2)2) of ATCC 3502 and proteolytic C. botulinum type F strain Langeland, matched by annotation using genomic context and BlastP, was compared to the signals observed by microarray analysis. The highest sequence similarity between two homologous CDSs (CBO2682 and CLI_2747) corresponding to an absence of microarray hybridisation was 84.8%. The lowest sequence similarity between two homologous CDSs (CBO2683 and CLI_2748) that hybridised to the microarray was 85.7%, giving a minimum value of approximately 85% sequence identity between CDSs for a positive microarray hybridisation result. A similar percentage was previously reported for studies with Candida  and Helicobacter pylori . Further validation was carried out by analysis of data for the 19 probes to the plasmid pBOT3502. Using a cut-off value of 0.30 (because of very high signals), only one strain, F9801A, gave a positive microarray signal for all 19 CDSs (Figure (Figure3).3). Subsequent tests demonstrated that this strain, but not two others that were tested, contained a plasmid that shared identical restriction sites to that of pBOT3502 (Figure (Figure4).4). Additionally, pBOT3502 contains five CDSs (CBOP15–CBOP19) that are dedicated to the synthesis and secretion of the bacteriocin, boticin . However apart from F9801A, no other strain gave a microarray signal for these probes, including C. botulinum strain 213B. This strain carries a plasmid bearing the genes for boticin B , so might have been expected to give a positive signal. However, alignment of the 1 kb sequence from strain 213B with that of pBOT3502 showed that sequence identity over this region, spanning pBOT3502 CDSs CBOP16 and CBOP17, was only 52.1% which would fail to give a positive microarray signal.
Flagellin proteins were isolated  and mass spectrometry studies of intact flagellin proteins were carried out as described in earlier studies [30,31]. In some cases a large precipitate was observed in dialysed flagellin preparations. Protein isolates were evaporated to dryness in a Savant SpeedVac (Thermo Fisher Scientific UK) before resuspending in 5–10 μl of formic acid. The sample was agitated gently to dissolve protein and diluted 10-fold with hexafluoroisopropanol. Samples were infused into a hybrid quadrupole time-of-flight mass spectrometer (Micromass Q-TOF2, Waters Corporation, MA, USA) at a flow rate of 0.5–1.0 μl/min [30,31]. Top down mass spectrometry experiments were performed as described by , using argon collision gas with collision energies ranging from 20–30V.
To lower the risk of PCR-based errors, the A5 genes were sequenced using non-cloned PCR products. Initial 3.8 kb PCR products of the majority of the gene CDS were generated using a LongRange PCR kit (Qiagen, UK), with primers BONTAF1 (5'-GCAACCAGTAAAAGCTTTTAAAATTC-3'), BONTAR1 (5'-CCATCCATCATCTACAGGAATAAA-3') and 100 ng genomic DNA as template. PCR products were purified using DyeEx 2.0 spin columns (Qiagen). Sequencing was carried out using an AbiPrism 3730 capillary sequencer. Sequence of the entire 3.8 kb PCR product was achieved by designing primers using available sequence data and by 'walking' forward on both strands. Comparison of the 3.8 kb sequence with published examples of C. botulinum neurotoxin genes showed that the A5 neurotoxin sub-type was a close relative of the A1 sub-type, which implied a similar neurotoxin locus structure. Therefore to amplify DNA containing the 5' and 3' ends of the A5 neurotoxin genes a series of PCR primers were designed that would recognise the cntB gene (encoding NTNH) and the transposase that flank the A1 neurotoxin gene of ATCC 3502 (CBO0805 and CBO0807 respectively). PCR was performed using 'outward facing' primers recognising the 3.8 kb sequence combined with these two sets of primers. PCR products that were of the expected size were sequenced. All sequencing fragments were assembled using the ContigExpress programme of the Vector NTI Advance 10 software package (Invitrogen). Comparison of the completed A5 neurotoxin gene sequence with that of published examples of other C. botulinum neurotoxin genes together with phylogenetic tree construction was carried out using the AlignX programme of this package. The A5 neurotoxin gene of four strains associated with UK cases of wound botulism was sequenced and found to be identical.
A representative of the sub-type A5 neurotoxin gene sequence from wound botulism strain H0 4402 065 was deposited in GenBank (accession number EU679004). Microarray data have been deposited with Array Express (accession number E-MEXP-1637).
The 61 strains of proteolytic C. botulinum and C. sporogenes tested in the present study were selected to represent diverse origins. They had originally been isolated at different times over a period of more than 80 years, from the environment (17 strains (including unknowns)) or associated with various forms of botulism (foodborne (20 strains), infant (17 strains), and wound (4 strains)). The strains were of different toxin types; type A toxin gene (17 strains), type B toxin gene (16 strains), type F toxin gene (3 strains), dual toxin genes (22 strains), and no toxin gene (3 strains of C. sporogenes) (Table (Table1).1). The CDS content of the 61 test strains was indexed in relation to the genome of proteolytic C. botulinum strain ATCC 3502 (Figure (Figure1).1). The dendrogram and heatmap (Figure (Figure1)1) were derived from this whole genome analysis, and show that all the strains of proteolytic C. botulinum and C. sporogenes share a high degree of genetic relatedness (e.g. clustering distance or branch-lengths in the dendrogram were short with a high proportion of shared CDSs in the heatmap (coloured yellow)). Most major branch points in the dendrogram occurred at distance measurements of between 0.20 and 0.44. A distance measurement value of 0.30 separated the 61 strains of proteolytic C. botulinum and C. sporogenes into nine clades (excluding ATCC 3502) (Figure (Figure1).1). The strains did not group together according to the location, environment, time of isolation, or the type of botulism with which they were associated (Figure (Figure1).1). This lack of grouping probably reflects the wide range of sources of the strains, and has been reported previously by workers using other typing methods [8,32]. The predominance of yellow shading in the heatmap indicates that the 11 strains in clades 7 and 8 (Figure (Figure1)1) were most closely related to the reference strain (ATCC 3502). For example, they shared the same FGI. While nine of fourteen type A1 neurotoxin strains (as ATCC 3502) were present in these two clades, a type B and type F strain were also present. Indeed, most clades contained strains of more than one toxin type (or sub-type), and most toxin types (or sub-types) were represented in more than one clade, suggesting that the evolution of the neurotoxin genes has not paralleled that of the remainder of the genetic complement. For example, clade 3 contains nine type B strains, one type A1 strain, two type A2 strains, two type F and four type A5(B) strains (the novel type A5(B) strains are described below), and clade 9 contained two type B strains and three C. sporogenes strains (Figure (Figure1),1), confirming the close relationship between proteolytic C. botulinum and C. sporogenes (e.g. [2,8,14]). Two clades, however, contained strains of just one toxin type. Clade 4 contained eight closely-related North American-isolated type A1(B) strains, and clade 7 comprised seven closely-related type A1 strains (Figure (Figure1).1). Clade 5 contained a single strain (NCTC 2012, Loch Maree) that forms type A3 toxin. Interestingly, other genomic indexing methods (MLST, AFLP, VNTR) also found this strain to be unique and well separated from other strains of proteolytic C. botulinum [8,33,34]. In addition to the nine clades identified, further sub-groupings were identified within each clade, often of the same toxin type (or sub-type). Some strains appeared highly similar to each other when compared to the genome of ATCC 3502. These strains included the three type Bf strains isolated from two patients and food following a foodborne botulism outbreak in Quebec that grouped closely together within clade 6, and the four strains associated with wound botulism in the UK within clade 3 (Figure (Figure1).1). Most of the differences in microarray data between the three type Bf strains were distributed around the signal channel ratio cut-off point of 0.55, suggesting that these apparent differences may reflect background noise associated with this type of analysis. Indeed, it is likely that these three Bf strains are identical, as they were isolated from a pâté and clinical samples from the same outbreak. On the other hand, the wound botulism strains showed some clear differences in their genetic content. Other genomic indexing methods (e.g. MLST, PFGE, AFLP, VNTR) have given a broadly similar pattern to that found in the present study, with groups of small numbers of closely-related strains generally of the same toxin type grouping together, with several distinct groups for each toxin type [8,32-34]. There are, however, a number of interesting anomalies that might be interpreted as evidence for horizontally acquired genetic information, and therefore worthy of further study, for example the type B strain 2345 that groups most closely with the C. sporogenes strains.
It is estimated that the core gene set for all 61 strains of proteolytic C. botulinum and C. sporogenes tested was 2155 CDSs (Figure (Figure5).5). This is approximately 63% of the CDSs of ATCC 3502 represented by probes on the microarray, and considerably higher than the value of 20% previously reported for 75 strains of C. difficile . This further confirms the close relationship of proteolytic C. botulinum and C. sporogenes and indicates that exchange of genetic information with other species has occurred less frequently than in C. difficile. Apart from the neurotoxin gene cluster itself, which although significant in terms of biological impact, represents a very small part of the genome, four main areas of divergence were identified; the plasmid (pBOT3502), the flagellar glycosylation island (FGI) and the two prophages (Figure (Figure1).1). Together these account for approximately 4.6% of the DNA (plasmid plus chromosome) of ATCC 3502. It was previously estimated that two type A1 strains shared 95–96% of their CDSs with ATCC 3502 , but the two strains included in this previous study are now revealed to be very close relatives. Indeed, it was estimated that the core gene set for the ten closely-related type A1 strains in clades 7 and 8 was 3055 CDSs, equating to 89% of the CDSs of ATCC 3502 (Figure (Figure55).
Additionally, two strains of non-proteolytic C. botulinum type E were tested, but too many CDSs were either absent or highly diverged for meaningful data to be derived (data not shown). It was previously reported that a strain of non-proteolytic C. botulinum type B and a strain of C. difficile were also too divergent to give a meaningful response on this microarray . The poor hybridisation of DNA from the three strains of non-proteolytic C. botulinum to the microarray reflects the wide evolutionary and phylogenetic distance between proteolytic C. botulinum and non-proteolytic C. botulinum. This is a direct result of the species "C. botulinum" being defined not on the basis of a close evolutionary or phylogenetic relationship, but on the basis of the disease caused .
The type A1 neurotoxin gene is normally present in the ha plus/orf-X minus type cluster, while the ha minus/orf-X plus cluster is more commonly associated with type A2, A3, A4 and F neurotoxin genes [9,36,37]. Twelve of the fourteen type A1 neurotoxin strains tested contained the ha plus/orf-X minus cluster, but in two strains (F9604A and MUL0109ASA) the type A1 neurotoxin gene appears to be in a ha minus/orf-X plus cluster (Figure (Figure6).6). This arrangement has also been recently reported for a small number of other type A1 strains [18,38]. In the present study, the genes (p47, orf-X1, orf-X2, orf-X3 and lycA) that are only present in the ha minus/orf-X plus cluster were always present together (26 strains), with no strain possessing only part of this cluster. The neurotoxin gene of the two type A2 strains (NCTC 9837 and ZK3) and one type A3 strain (NCTC 2012 – Loch Maree) was also present in a ha minus/orf-X plus cluster (Figure (Figure6),6), as expected [9,13,18,37]. Although the two type A1 ha minus/orf-X plus strains (F9604A and MUL0109ASA) had the same neurotoxin cluster as the type A2 and A3 neurotoxin-forming strains (Figure (Figure6),6), they were in different clades well separated from each other and from the other type A1 neurotoxin-forming strains (Figure (Figure1).1). Instead these two type A1 ha minus/orf-X plus strains grouped with a type Ba strain (CDC 657) (Figure (Figure1,1, clade 6). The type A neurotoxin gene in strain CDC 657 (type A4) is also in a ha minus/orf-X plus cluster . Since previous studies using AFLP, MLST and MVLA have shown two other closely-related type A1 ha minus/orf-X plus strains (CDC 297 and CDC 5328) also grouped with strain CDC 657 [8,33,34,38], it is likely that the four type A1 ha minus/orf-X plus strains are closely related. Strain CDC 657 may have recently acquired a type B neurotoxin gene, or the type A1 ha minus/orf-X plus strains may have lost a type B neurotoxin gene. It is noted that the neurotoxin genes in CDC 657 are present on a plasmid, while the type A gene in at least one type A1 ha minus/orf-X plus strain (CDC 5328) is located on the chromosome .
The 16 type B strains gave an almost identical hybridisation pattern, with all neurotoxin genes present in a ha plus/orf-X minus cluster (Figure (Figure6).6). This is consistent with previous reports [9,12]. Strain MRB had a weak signal for cntB (encoding NTNH), this may reflect the mosaic structure of cntB, and a previous genetic crossover event between two types of neurotoxin gene cluster [2,9,40]. The three type F strains gave a ha minus/orf-X plus pattern (Figure (Figure6),6), an arrangement consistent with that reported in the genome sequence for strain Langeland (CDSs CLI_0845 to CLI_0850). However, while strains Langeland and Walls 8G grouped together in the whole genome analysis (Figure (Figure1,1, clade 3), strain H461297F grouped with type A1 strains, providing further evidence that the neurotoxin gene clusters are not evolutionarily tied to their host organism [8,9,13].
The genes cntR/A1 and cntR/F (sometimes called p21) encode closely related sigma 70 factors involved in regulation of the neurotoxin genes [9,13]. The probe designed to be specific for cntR/A1 gave a positive result with all strains that possessed a neurotoxin gene in a ha plus/orf-X minus cluster (Figure (Figure6).6). Similarly the probe designed to be specific for cntR/F gave a positive result with all strains that possessed a neurotoxin gene in a ha minus/orf-X plus cluster (Figure (Figure6).6). The type of neurotoxin regulatory gene (cntR) present, therefore, is entirely in accordance with the type of neurotoxin gene cluster, but not with the type of neurotoxin gene.
Twenty-two strains tested in the present study possessed two distinct neurotoxin genes. Fourteen of the dual gene toxin strains possessed a type A1 and type B neurotoxin gene. Two of these strains (CDC 657 and CDC 588) form both neurotoxins, albeit in different proportions, while the other 12 strains appear to form only type A neurotoxin (Figure (Figure6).6). All these 14 strains gave an identical response in that they possessed complete ha plus/orf-X minus and ha minus/orf-X plus clusters. The microarray data cannot distinguish between dual toxin gene strains which carry a type A1 toxin gene in a ha minus/orf-X plus cluster, and a type B gene in a ha plus/orf-X minus cluster or vice-versa. However, as all type B neurotoxin genes have been associated with a ha plus/orf-X minus cluster (Figure (Figure6;6; [9,12]), the simplest explanation is that the dual toxin gene strains are in the former arrangement. This has been reported in strains NCTC 2916 and CDC 657 [9,36,41] and from a preliminary analysis of strains NCTC 11199, MDa10, 667 and CDC 588 [18,41]. The four strains that formed both type B and type F toxin showed a similar hybridisation profile to the A1(B) strains except that they possessed a type F toxin gene rather than a type A toxin gene. Again both the full ha plus/orf-X minus and ha minus/orf-X plus clusters are present (Figure (Figure6).6). It is likely that these strains possess a type F gene in a ha minus/orf-X plus cluster, plus a type B gene in a ha plus/orf-X minus cluster. This hypothesis is supported by (i) this is the pattern found in strains forming either type B or type F toxin, (ii) such an arrangement was indicated by a preliminary analysis of strain CDC 3281 , and (iii) was reported for a recently sequenced unnamed Bf strain [GenBank: NZ_ABDP00000000].
The present study included four strains of proteolytic C. botulinum (H0 4244 0055, H0 4402 065, H04464 107, H0 4068 0341) that formed type A neurotoxin, and had been isolated from patients presenting with wound botulism in different regions of the UK in 2004. Whole genome analysis revealed that these strains formed a sub-group within clade 3, distinct from other type A strains. Since the majority of strains forming type A neurotoxin clustered together within clades 4 or 7, this suggested the possibility that they might represent an evolutionary distinct group which could be sufficiently diverged to also produce a novel neurotoxin sub-type (Figure (Figure1).1). From the DNA sequence of the entire cntA coding region, a translation product could be predicted that comprised 1297 amino acid residues of a type A neurotoxin gene. The cntA/A gene sequences from all four strains were identical suggesting that these strains may derive from a common source. Comparison with published examples of neurotoxin A sub-types revealed that the wound botulism-derived cntA/A genes were distinct from toxin sub-types A1 – A4 (Figures (Figures7,7, ,88 and and9;9; Table Table2).2). Subtypes of cntA are defined by a minimum of 2.6% difference between amino acid sequences [7,10]. The closest relative of the wound botulism-derived cntA/A gene is the cntA/A1 gene (Table (Table2),2), and the new DNA sequence predicts a 2.9% difference (37 amino acid residues) between the wound botulism-derived cntA/A genes and the cntA/A1 genes, the latter tending to share approximately 99.8% identity between themselves (see Figures Figures77 and and88 for an alignment of amino acid sequence of all five sub-types). On this basis these wound botulism-derived cntA/A genes define a new sub-type, and should be termed cntA/A5. Furthermore, the four type A5(B) strains represented the only 'non-A1' neurotoxin-forming strains that possessed a type A neurotoxin gene in a ha plus/orf-X minus cluster (Figure (Figure6).6). Interestingly all four type A5 strains gave a positive signal with the C-terminal type B probe. Following a combination of DNA sequencing and PCR analysis, the presence of a near complete type B neurotoxin gene with the 5' end (i.e. N-terminus of protein) either absent or diverged from previously published examples was detected (data not shown). As such, these strains also represent the first published examples of type A(B) strains that lack the ha minus/orf-X plus cluster for neurotoxin genes. Since active type B toxin was not detected in the mouse test, they are designated as type A5(B). Wound botulism cases in the UK are associated with heroin abuse , and it is likely that the source of these strains of proteolytic C. botulinum is the same as the heroin, which comes from Afghanistan . This may indicate that previously unknown botulinum neurotoxin types are present in this part of Asia; the majority of published botulinum neurotoxin gene sequences are from strains originating in Europe or North America.
The amino acid residue differences that distinguish the A5 sub-type from the four other type A sub-types are scattered throughout its length (Figures (Figures77 and and8;8; [7-9]). The N-terminal eight amino acid residues are involved in binding to the neuronal cell plasma membrane . Significantly the A5 neurotoxin has a leucine at position 2, in contrast to the usual proline, an amino acid known to cause marked conformational changes in peptide secondary structure. The C terminus of the light chain, especially residues 398–448 is important for solubility, stability and catalysis , but only one residue (E444) close to the protease nicking site differs in this region. Similarly, of the heavy chain residues that are proposed to build the lactose and sialyllactose-binding pockets needed for ganglioside binding , only L1278 has been changed (to an F). It is tempting to predict from this in silico study that the gene product of cntA/A5 will share a similar toxicity to that of cntA/A1, although the fact that at least three residues known to be functionally important are different may have important implications.
The type B toxin producing strain 2345 had a weak signal for all four neurotoxin-associated probes (Figure (Figure6),6), and also groups together with the three C. sporogenes strains in the whole genome analysis (Figure (Figure1).1). These observations support a hypothesis that strain 2345 may represent a strain of C. sporogenes which has recently acquired part of the (or a diverged but intact) neurotoxin gene cluster in the recent evolutionary past. Interestingly, two of the three C. sporogenes strains, which were expected to be completely negative for all neurotoxin-associated probes, gave a weak signal to the cntR/A1 probe (Figure (Figure6).6). This could be due to the presence of a distantly related or partial cntR gene, implying that these C. sporogenes strains may either represent a descendent of C. botulinum that have lost most of their ha plus/orf-X minus neurotoxin gene cluster, or may have acquired a neurotoxin gene cluster by horizontal gene transfer (as postulated for strain 2345), but then subsequently lost most of it. Although not the same strain as the three used in this work, BLAST searches of the predicted peptides of the (unfinished) C. sporogenes strain ATCC 15579 genome sequence showed that of several proteolytic C. botulinum examples of this gene (and one non-proteolytic C. botulinum example, that of strain Eklund 17B), CntR of ATCC 3502 gave the highest percentage identity (48%) over the longest unbroken stretch of peptide sequence. Whereas this tends to support the microarray data, the stretch of sequence was only 27 amino acid residues in length, so genome sequence analysis of more C. sporogenes strains would be needed to further investigate this interesting observation.
It is evident that in strains of proteolytic C. botulinum, the distribution of neurotoxin genes and neurotoxin cluster type are not consistent with the whole genome analysis (Figure (Figure6,6, Figure Figure1).1). This is consistent with previous reports by other genomic indexing methods (e.g. 16S rrn, PFGE, AFLP, MLST [2,8,32,33]). The evolutionary patterns of neurotoxin and associated genes within the neurotoxin cluster are also incompatible, and are likely to have arisen from several distinct recombination events. For example, the present study has confirmed earlier reports [18,38] that in type A1 strains, the neurotoxin genes may be located in a ha plus/orf-X minus cluster or in a ha minus/orf-X plus cluster. It was also found that the cntR gene correlated with neurotoxin cluster type, rather than neurotoxin gene. Previous reports have identified that NTNH-encoding genes also correlated with neurotoxin cluster type rather than neurotoxin gene, and that the middle of the NTNH gene may be a hot-spot for recombination events within the neurotoxin cluster [2,9,13]. Putative insertion sequence (IS) elements located close to the neurotoxin cluster and the localisation of neurotoxin genes on large plasmids may also have played a role in mobilisation and gene transfer of neurotoxin and associated genes [9,39].
Microarray analysis of the CDSs within the FGI separated strains into six divisions. Divisions 1 and 2 had similar profiles, with division 1 missing some CDSs contained within FGI-II. Divisions 3–6 were all missing large sections of FGI-II, with division 5 also missing CDSs within FGI-I (Figure (Figure2).2). The structure of these divisions indicates that, as seen for the neurotoxin cluster genes, the evolution of the FGI may have occurred independently of the remainder of the genetic complement, with most divisions containing strains of more than one toxin type and from more than one clade. Divisions 1–3 each contained strains of at least four toxin types (or sub-types) that belonged to three or more clades, while division 4 contained only type B strains from clade 3, and division 5 comprised only the strains in clade 9 (type B and C. sporogenes strains). Division 6 contained just one strain, 17A. The genetic variation highlighted by these divisions (Figure (Figure2)2) forms the basis for a typing method for proteolytic C. botulinum .
The genetic repertoire of the FGI indicated by the microarray analysis suggested that the glycan biosynthetic capacity of these C. botulinum strains may vary (Figure (Figure2).2). Indeed, it has been shown that strains differ in their FlaA glycan structure, and it is proposed that FGI-II is involved in this process [29,31]. Flagellin proteins were isolated from representative strains in each division to determine the nature of the glycan produced and its correlation with FGI microarray profiles. Top down MS analysis of intact flagellin protein revealed diversity in glycan composition amongst isolates [30,31]; the masses of which are shown in Figure Figure2.2. Flagellin from the majority of strains in division 1 harboured a glycan oxonium ion at m/z 259. Further characterisation of this glycan by MS/MS showed fragmentation ions at m/z 200.1, 182.1, 158.1, 154.1, 126.1 and 112.1, characteristic of a di-N-acetylhexuronic acid, previously identified as part of a trisaccharide modification on Methanococcus voltae flagellin . Many of the flagellins of division 2 strains carried a glycan oxonium ion at m/z 301. The MS/MS spectrum of this ion also shared fragmentation ions with a di-N-acetylhexuronic acid, but with the increased mass likely to correspond to the addition of a third acetyl group (data not shown). A glycan oxonium ion at m/z 418 was detected as the FlaA modification on all division 3 strains examined. This FlaA glycan has been fully characterized in strain FE9909ACS as a novel legionaminic acid derivative, Leg5Ac7NMeGlu . Strains 920A276 and CDC 15044 from division 4 possessed flagellins with glycan oxonium ions at m/z 417 and 431, which shared glycan oxonium ion fragmentation patterns typical of nonulosonic acid sugars (data not shown). The flagellin of the division 5 strain FE0507BLP had a glycan oxonium ion at m/z 317, which had the characteristic MS/MS fragmentation pattern of the nonulosonic acid sugars pseudaminic and legionaminic acid (data not shown). These sugars have been structurally characterised from the flagellins of Campylobacter jejuni , Campylobacter coli  and Helicobacter pylori . Taken together, these observations show that differences in the FGI microarray profiles may be reflected in the mass of glycan oxonium ions that modify the flagellin. Interestingly, top down mass spectrometry analysis of FlaA from strain 17A did not produce any marker ions characteristic of glycan, although the mass of the protein is greater than could be predicted from its DNA sequence. This indicates that it too is probably post-translationally modified . In this case a 'bottom up' mass spectrometry analysis of flagellar tryptic peptides may be required to identify the glycan moiety.
A representative of division 3, which appears to produce the novel legionaminic acid derivative, Leg5Ac7NMeGlu, is the type F strain Langeland, the genome of which has been recently sequenced. Comparison of FGI sequences of both Langeland and ATCC 3502 showed that CDSs of FGI-I shared at least 80% identity, while FGI-II was highly divergent and was 30 kb smaller in strain Langeland (Figure (Figure2).2). Homologues to the biosynthetic genes for legionaminic acid synthesis in Campylobacter coli have been identified in the FGI-II region of the Langeland genome . The definitive confirmation that CDSs in FGI-II are responsible for biosynthesis of the glycan found on C. botulinum strain Langeland FlaA, however, awaits further genetic analysis.
Previously, nonulosonic acid sugars such as legionaminic acid and pseudaminic acid have been identified as the post-translational modification of flagellins in the Gram-negative gastrointestinal pathogens, Campylobacter and Helicobacter . In these bacteria, the glycosylation of the flagellin is essential for filament assembly and glycan modifications have been shown to play a role in pathogenesis [53,54]. The presence of nonulosonic acid sugars in numerous strains of C. botulinum may have an important bearing on its ability to establish a gut infection and thereby cause infant or adult intestinal botulism. Although the present study has not correlated the distribution of specific flagellin glycan modifications with the type of botulism caused, this property may enable a strain to bring about infant/adult intestinal botulism at a lower infectious dose than that of strains lacking these flagellin modifications. A comparative genomic analysis of Campylobacter jejuni identified distinct distributions of flagellar glycosylation genes (cj1321–cj1326) that were present only in strains associated with colonisation of livestock . The hypothesis was developed that the type of flagellar glycosylation genes in Campylobacter jejuni strains conferred a survival advantage to these strains within livestock, offering a possible explanation for the host specificity of some Campylobacter jejuni strains. It remains to be established whether diversity in flagellar glycan biosynthetic capacity in C. botulinum is similarly related to host specificity and the colonization ability of isolates.
The most important aspects of the biology and evolution of proteolytic C. botulinum have been highlighted by this study, particularly in relation to the neurotoxin and its associated cluster and the FGI. The close relationship with C. sporogenes, and very distant relationship with non-proteolytic C. botulinum have been confirmed. Proteolytic C. botulinum and non-proteolytic C. botulinum are phylogenetically distinct organisms that coincidentally share type B and type F neurotoxin genes. These genes are of such sequence similarity as to obviously share a recent common ancestor, and appear therefore to have crossed the species barrier. Intriguingly, type A and type E neurotoxin genes seem to be mutually exclusive and are each restricted to just one of these species. The genome of proteolytic C. botulinum appears to be relatively stable, and strains sequenced to date display a high degree of synteny (data not shown). There are, however, variable regions, and we have presented evidence for the independent horizontal transfer of genes encoding the neurotoxin cluster and FGI, compared to the remainder of the genetic complement. Transfer of neurotoxin and associated genes may be associated with a hot-spot for recombination within the NTNH, closely associated IS elements, and plasmids. Further investigations of unexpected toxin or FGI types within clades may be particularly interesting, and reveal more about the acquisition or loss of genetic material. For example, while most type A1 strains grouped together according to whole genome and FGI analysis, four appeared to be distinct. Two of these type A1 strains, FE9604A and MUL0109ASA were closely related to each other, with the toxin gene in a ha minus/orf-X plus cluster. Strains 17A and 96A were both ha plus/orf-X minus strains, but appear to be different to each other and all the other type A1 strains by whole genome and FGI analysis. Interestingly, strain 96A was also well separated from other type A1 strains by PFGE . The sequencing of further strains (which is rapidly becoming affordable for most laboratories) is a particularly attractive way forward, as unlike microarray analysis, which can only highlight CDSs that are present or absent in a test strain, it will provide information not only on what has been inserted or lost, but where on the genome this has taken place. Indeed the genomes of several of the strains used in this study have recently been sequenced, and those with slightly larger genomes typically also carry approximately 300–600 novel genes with respect to the ATCC 3502 strain used as a reference in this work (data not shown).
A number of typing tools have been used for the molecular characterisation of proteolytic C. botulinum. Some (e.g. ribotyping, 16S rrn sequencing) can be used to identify the organism, but are not particularly effective at discriminating between strains [8,55]. Others (e.g. PFGE, MLST, AFLP, VNTR, fla sequencing, DNA microarrays) are able to discriminate between strains [8,14,29,32-34]. The present study and previous work  have demonstrated that comparative genomic indexing using a DNA microarray based on the genome sequence of ATCC 3502 is an effective tool to discriminate strains of proteolytic C. botulinum. Advantages of microarrays are that they can infer evolutionary relationships better than single/multi-locus methods and additionally provide valuable information on the genome content of tested strains, thereby providing an insight into the biology and evolution of the organism. The present microarray is suitable for the forensic analysis of strains of proteolytic C. botulinum, including investigations of bioterrorism associated events. A second generation DNA microarray could be developed for this purpose based on the variable regions identified between a number of sequenced strains, and utilise printed rather than spotted microarrays.
MWP and JWA obtained funds for, initiated and conceived this study. ATC, CJP and DRM carried out the comparative genomic indexing work. ATC carried out the analysis with a contribution from MA. CJP, SMT and SML performed the FGI analysis. ATC and MWP coordinated the writing of the manuscript, with all authors providing critical feedback. All authors read and approved the final manuscript.
We thank Greg Sanders, Jeff Bussey and Luc Tessier for technical assistance, Kathie Grant for providing strains, and Carmen Pin, Sandra Stringer and Arnoud van Vliet, for helpful discussions. This work was supported by project CRTI-02-0091TA of the Defense Research and Development Canada CBRN Research and Technology Initiative, and by Health Canada and the Competitive Strategic Grant of the BBSRC.