|Home | About | Journals | Submit | Contact Us | Français|
We describe the development of a multilocus sequence typing (MLST) scheme for Corynebacterium diphtheriae, the causative agent of the potentially fatal upper respiratory disease diphtheria. Global changes in diphtheria epidemiology are highlighted by the recent epidemic in the former Soviet Union (FSU) and also by the emergence of nontoxigenic strains causing atypical disease. Although numerous techniques have been developed to characterize C. diphtheriae, their use is hindered by limited portability and, in some instances, poor reproducibility. One hundred fifty isolates from 18 countries and encompassing a period of 50 years were analyzed by multilocus sequence typing (MLST). Strain discrimination was in accordance with previous ribotyping data, and clonal complexes associated with disease outbreaks were clearly identified by MLST. The data produced are portable, reproducible, and unambiguous. The MLST scheme described provides a valuable tool for monitoring and characterizing endemic and epidemic C. diphtheriae strains. Furthermore, multilocus sequence analysis of the nucleotide data reveals two distinct lineages within the population of C. diphtheriae examined, one of which is composed exclusively of biotype belfanti isolates and the other of multiple biotypes.
Diphtheria has historically evoked fear and terror due to its slow suffocating death and previously unknown origin, but socioeconomic improvement and the introduction of mass immunization in the 1940s and 1950s led its near-elimination in the developed world. However, diphtheria remains a global disease and is endemic in many countries. The World Health Organization (WHO) has recorded outbreaks throughout the world, including Afghanistan, Algeria, Iraq, Lao People's Republic, Mongolia, Papua New Guinea, Sudan, and Thailand (1). It is also a potentially resurgent infectious disease, exemplified in the 1990s by a notable epidemic in the newly independent states of the former Soviet Union (NIS), where vaccination had been employed since 1958 (33). At least 20 cases were reported beyond these countries, highlighting the potential threat of introduced strains from countries in which it is endemic and epidemic (29). Furthermore, according to serological surveillance studies, the proportion of susceptible individuals in vaccinated populations remains high (7, 10). Edmunds et al. estimated that there are inadequate protection levels in the United Kingdom for 70 to 75% of those aged 50 to 60 years old (7). Similar observations were recently reported for individuals who had followed the French vaccine recommendations (17).
It is therefore apparent that typing tools enabling global Corynebacterium diphtheriae surveillance are of great importance. Based upon their biochemical and morphological properties, four C. diphtheriae biotypes have been identified: mitis, gravis, intermedius, and belfanti (8). Several typing techniques for C. diphtheriae have been developed. Traditionally these techniques were based upon serologic, phage, and biotyping methods. However, since the methods provide limited resolution, molecular typing techniques, including amplified fragment length polymorphisms (AFLP) (4), random amplified polymorphic DNA (RAPD) (3, 24), multilocus enzyme electrophoresis (MEE) (28), spoligotyping (21), and pulsed-field gel electrophoresis (PFGE) (5), have been developed and show significant intraspecies genetic diversity. Recently a comparison of the different typing techniques was performed (6). It was shown that the most discriminative was ribotyping, the current “gold standard” typing method for C. diphtheriae. This method has identified 86 distinct ribotype patterns and clusters isolates associated with the former Soviet Union (FSU) outbreak (5, 11, 28). However, ribotyping is very dependent upon the use of a rigid standardized method, and without this, there are clearly difficulties in reproducibility (25). Additionally, typing methods based upon band matching do not clearly reveal the population structure or underlying evolutionary mechanisms of a given species.
Proposed in 1998, multilocus sequence typing (MLST) overcomes the problems encountered with ribotyping by directly indexing nucleotide variation within several core metabolic genes, thereby providing portable, reproducible, and high-resolution data appropriate for the evolutionary and epidemiological investigation of diphtheria (19). We describe the development of an MLST scheme to examine the genetic relationship between a temporally and geographically diverse collection of C. diphtheriae isolates.
As shown in Table Table1,1, a total of 150 C. diphtheriae isolates were examined and obtained from the Centers for Disease Control and Prevention (CDC) (Atlanta, GA), the Health Protection Agency (HPA) (London, United Kingdom), the National Consiliary Laboratory for Diphtheria (Germany), and the National Institute of Public Health (Poland). The MLST scheme was validated by comparison to 29 previously ribotyped isolates encompassing 20 different ribotypes (5, 28). The collection is both temporally and geographically diverse, with isolates from 18 countries, covering a time period from 1957 to 2006. Of the 45 human cases with available clinical information, 28.9% were associated with carriage (n = 13), 42.2% were isolated from diphtheria (n = 19), 11.1% were isolated from cutaneous lesions (n = 5), 13.3% were isolated from patients with upper respiratory conditions, including pharyngitis (n = 1), tonsillitis (n = 4), and a sore throat (n = 1), and 4.3% (n = 2) were obtained from patients with both osteomyelitis and cutaneous lesions. Both toxigenic (n = 96) and nontoxigenic (n = 52) (toxigenic data were unavailable for two isolates) biotype gravis (n = 43), intermedius (n = 6), mitis (n = 85), and belfanti (n = 16) strains were examined.
Toxigenicity was determined by the Elek immunoprecipitation method (Elek test) in accordance with the WHO guidelines (9). All isolates were biotyped using the API Coryne kit (bioMérieux, Lyon, France) according to the manufacturer's instructions.
Potential housekeeping genes were identified by comparing the C. diphtheriae (1), Corynebacterium glutamicum (C. glutamicum) (15), and Corynebacterium efficiens (C. efficiens) (26) genome sequences using the Artemis Comparison Tool (ACT) and the Double ACT program, available at http://www.sanger.ac.uk/Software/ACT/ and http://www.hpa-bioinfotools.org.uk/pise/double_act.html, respectively. Amplification and nested sequencing primers were designed for the loci atpA, dnaE, dnaK, fusA, leuA, odhA, and rpoB (Table (Table2)2) using Primer3 (30). The sequencing primers for rpoB were previously described by Khamis and colleagues (16).
DNA was extracted primarily as described by Mothershed et al. (23). Each 25-μl PCR was carried out using 10 ng of chromosomal DNA, 5 μl Q solution (Qiagen, United Kingdom), 4.0 μl chromosomal DNA (5 to 20 ng/μl), 1.0 μl forward primer (10 pmol/μl), 1.0 μl reverse primer (10 pmol/μl), 2.5 μl 10× PCR buffer (Qiagen) (containing 15 mM MgCl2), 0.5 μl deoxynucleoside triphosphate (dNTP) solution (Qiagen) (10 mM [each] dNTP), 0.125 μl Taq polymerase (Qiagen, 5 U/μl), 0.5 μl MgCl2 (Qiagen) (25 mM), and PCR-grade water. All primer sets were designed to ensure they had similar melting temperatures, and reaction conditions were as follows: initial denaturation at 94°C for 1 min; 35 cycles of denaturation at 94°C for 1 min; and primer annealing at 58°C for 1 min and extension at 72°C for 2 min, followed by a final extension step of 72°C for 5 min. Amplicons were purified using the MiniElute UF plates (Qiagen, United Kingdom) according to the manufacturer's instructions and stored at −20°C.
Amplicon nucleotide sequences were determined by nested sequencing using the BigDye Terminator ready reaction mix, v3.1 (Perkin-Elmer Applied Biosystems, Foster City), following the manufacturer's protocol. The forward and reverse sequences of a given locus were edited, aligned, and trimmed to the desired length using the SeqManII software program (DNASTAR, Madison, WI).
Allelic numbers were assigned to each unique allele for a given locus. For each isolate, the allelic profile was generated by combining the allele numbers for each locus in the order atpA, dnaE, dnaK, fusA, leuA, odhA, and rpoB. A novel sequence type (ST) designation was given to all unique allelic profiles, while isolates with identical profiles belonged to the same ST.
Tree concordance was assessed using the method developed by Holmes and colleagues; however, here tree congruence data were increased to compare 200 rather than 100 randomly generated maximum-likelihood (ML) trees (14).
The allelic sequences for each isolate were concatenated (2,545 bp), and phylogenetic trees, with 1,500 bootstrap replicates, were generated by the neighbor-joining method using the Jukes Cantor algorithm within the MEGA software program, v4 (32). Isolates were grouped based upon the MLST definition of a clonal complex (or eBURST group) being a cluster of isolates sharing at least six of seven alleles, using the eBURST program, available at www.mlst.net. To visualize clustering within the population and to detect recombination between STs, Splits decomposition analysis was performed using the SplitsTree software program, v4. Neighbor-joining phylogenetic analysis was done (www.mlst.net), and the index of association (IA) was calculated using the LIAN software program, v3.5 (www.pubmlst.org). Significant IA values were determined using the Monte-Carlo method with 1,000 resamplings. To ensure sampling bias did not affect the value, one representative of each ST was used. The allelic frequencies, GC content, number of polymorphic sites, and ratio of nonsynonymous substitutions to synonymous substitutions (dN/dS ratio) for all seven loci were calculated using the START software program, v2. Nucleotide identities were calculated using the maximum composite likelihood model within MEGA, v4.
In order to validate the MLST scheme, a collection of temporally and geographically diverse isolates were analyzed, including two equine isolates previously described by Henricson et al. (13). Cultures from both epidemiologically linked and unrelated cases were examined to assess the scheme's performance. Isolates previously designated as belonging to one of the four C. diphtheriae biotypes, gravis, mitis, intermedius, and belfanti, were typed by MLST to better understand their genetic relationships to each other and to determine their epidemiological value.
As shown in Table Table3,3, among the 150 isolates investigated, the mean average allele length for each locus was 363 bp and ranged from 342 bp (rpoB) to 384 bp (leuA). All alleles for a given locus were of equal lengths and, to aid further analysis, were in the correct reading frame. The proportion of variable sites at each locus varied from 5.1% (dnaE) to 10.3% (fusA) (mean average = 7%).
To determine the degree of selective pressure upon each locus, the ratio of nonsynonymous to synonymous substitutions (dN/dS) was determined. Since the ratios were significantly less than 1, it is clear that the genes chosen were not under purifying selection and were therefore suitable for MLST analysis (Table (Table33).
A total of 73 STs were assigned to the 150 C. diphtheriae isolates investigated and divided into 11 clonal complexes designated by eBURST groups (Table (Table1).1). For this study, isolates were assigned as members of an eBURST group when six of seven MLST alleles were shared.
MLST identified two clonal complexes linked to diphtheria outbreaks. eBURST group 2 (composed of ST-8, ST-12, ST-52, and ST-66 strains) was associated with the FSU epidemic. Six isolates identified as having a Sankt-Peterburg (n = 3) or Rossija (n = 3) ribotype, the two clonally derived ribotypes linked with the FSU outbreak, clustered within this group, as did three epidemic strains identified by Skogen and colleagues in cultures obtained prior to the outbreak (31). Interestingly, the first nontoxigenic gravis strain to cause septicemia and endocarditis in Poland also belonged to eBURST group 2. This strain (493/K/04) was isolated in a region where no diphtheria cases had been reported for 10 years (34).
Twenty-eight of 31 isolates from Haiti (n = 14) and the Dominican Republic (n = 17) were collected during a diphtheria epidemic (2004 to 2006). Eighty-nine percent (n = 25) of the outbreak isolates were ST-31, and another (isolate 158) was ST-4, a single locus variant (SLV) of ST-31 at the fusA locus (98.06% allelic nucleotide similarity). The SLV is likely to have arisen by recombination, since multiple nucleotide substitutions were detected within the fusA variant and both alleles were identified in other isolates within the data set. Two distinct strains that also circulated during the epidemic, 154 and 166, were not closely related to the outbreak strains, sharing only two and three of the seven loci, respectively. The epidemic strain was present prior to the suspected outbreak since two of the three isolates obtained in the preepidemic period (1995 to 2000) also belonged to the outbreak clonal complex. The remaining isolate (isolate 11) was distinct from the epidemic cluster but was a double locus variant (DLV) of isolate 166 obtained in 2004.
MLST analysis indicates that some strains examined in this study are geographically dispersed while others are associated with specific geographical regions. For example, eBURST group 1 is composed of isolates from Poland, Russia, Kazakhstan, and the United States, while only isolates from the Caribbean region belong to eBURST group 9.
C. diphtheriae has been isolated from animals (2, 12, 18, 13, 27). Zoonotic infections with C. diphtheriae, although currently rare, may act as a reservoir for human infection. It is therefore important to characterize the isolates to understand their relationship to human strains. Two biotype gravis equine isolates, previously described by Henricson (13), were identical by MLST, and although comprising a unique ST, they clearly cluster within the typical human C. diphtheriae population, as shown by Fig. Fig.11.
In total, 86 validated ribotypes have been assigned to the C. diphtheriae ribotype database (11). As shown by Fig. Fig.1,1, the examination of 29 previously ribotyped isolates by MLST was in concordance with the ribotyping data. This study of 20 of the total 86 ribotype patterns by MLST represents a preliminary comparison but is sufficient to help validate the MLST scheme in terms of correctly assigning strains that were identical and part of an outbreak and those believed to be clearly distinct strains.
In two instances, MLST provided greater strain discrimination than ribotyping. First, the Lyon ribotype isolates, 19 and 20, were SLVs of each other, differing at the atpA locus, and were identified as ST-18 and ST-19. This clonal expansion is likely to have arisen by recombination rather than point mutation, since four base pair changes (4/378) were identified and both alleles were frequently detected in other STs. De Zoysa was also able to differentiate between the two isolates using AFLP, where a single band difference was observed (4). Second, one of three Sankt-Peterburg ribotype isolates (isolate 13) was an SLV at the atpA locus, where two base pairs substitutions were identified.
Previous studies identified two predominant ribotypes associated with the FSU outbreak: Sankt-Peterburg and Rossija. Three isolates of each ribotype were analyzed by MLST. With the exception of isolate 13, discussed previously, all were indistinguishable by MLST (ST-8). AFLP, RAPD, and PFGE typing studies were also unable to differentiate between the two ribotypes, and only a single band difference was detected by ribotyping (3, 4, 5). Therefore, the preliminary MLST data further support the suggestion (4, 5) that Sankt-Peterburg and Rossija ribotyped isolates are part of a shared clonal complex, along with ST-12 and ST-52, here forming eBURST group 2.
Likewise, isolates belonging to ribotypes Vladimir (ST-25) and Lyon (ST-18 and ST-19) are clonally derived (eBURST group 5), as are isolates belonging to ribotypes Cluj (ST-5) and Gatchina (ST-9), closely related to each other (eBURST group 11).
There was some linkage between ST, biotype, and toxin status. However, of the 73 STs containing more than one isolate, 21% were associated with multiple biotypes and 32% with variable toxin status. There was no clear association between disease status and ST, and carriage isolates could not be differentiated from disease-causing strains.
Importantly, while some of the clusters identified by MLST contained isolates of only one biotype, isolates with either the gravis, intermedius, or mitis biotype were not found within genetically distinct subgroups. However, as illustrated by Fig. Fig.11 and and2,2, two distinct lineages (I and II) were identified within our C. diphtheriae collection. Lineage I contained the largest proportion of isolates and exclusively all of the isolates biotyped as mitis, gravis, and intermedius. Five “belfanti” isolates were also found within lineage I; however, these “belfanti” isolates were all of ST-26, from James Bay, Canada, and all bar one produced diphtheria toxin (DT), which is atypical for the biotype (20). By comparison, lineage II was composed exclusively of all 11 remaining biotype belfanti isolates, representing 11 distinct STs. These lineage II belfanti isolates are of typical phenotype, being nontoxigenic. Furthermore, based upon concatenated MLST nucleotide data, lineage I strains were, on average, 2.5% divergent from those in lineage II, each forming distinct subgroups when represented by neighbor-joining trees (Fig. (Fig.11).
The balance between recombination and mutation has a significant impact upon the population biology of bacteria and their ability to evolve under strong selective pressures. As previously discussed, many SLVs within a clonal complex are likely to have arisen by recombination since multiple substitutions were observed and the allele was identified in other strains within the data set. The MLST data were further analyzed to determine the potential for genetic exchange within isolates of C. diphtheriae.
The degree of recombination within a bacterial population can be determined using the index of association (IA), which measures the level of linkage between alleles at different loci. An IA not significantly greater than zero after 1,000 computer randomizations suggests the organism is in linkage equilibrium and is therefore freely recombining, while a population with an IA significantly greater than zero is considered to be clonal. Overall, the IA value was 0.1176 (P = <0.01). To minimize any distortions created when analyzing two relatively distinct populations, the IA value was also individually calculated for both lineages I (0.0667; P = <0.01) and II (0.0395; P = 0.14).
To further examine the impact of recombination upon the C. diphtheriae collection, the congruence observed between gene tree topologies was determined using the method described by Holmes (14). In clonal populations, the genetic relationships at all loci will be the same, while for recombinogenic species or subpopulations, the phylogenetic trees will appear incongruent. As shown by Table Table4,4, pairwise comparisons between the gene tree topologies revealed significant discordance for all loci. However, only the association between the atpA ML tree and the dnaK, odhA, and rpoB trees and that between the dnaK tree and atpA and odhA were deemed to have no more similarity than that with trees of random topology. It is therefore apparent that while recombination plays a role in the evolution of C. diphtheriae, it has not obscured all phylogenetic signals.
Diphtheria is still endemic in many countries and, as exemplified by the FSU outbreak, is a potentially resurgent disease. Furthermore, given the low levels of protection within adult populations (in particular seniors), accurate and reproducible typing methods are required to monitor and characterize C. diphtheriae. Although numerous typing techniques for C. diphtheriae have been described, their use is often hindered by limited reproducibility and subjective analysis. MLST is able to circumvent these limitations by directly analyzing nucleotide information within selectively neutral housekeeping genes. The data produced are objective and, due to their portability, amenable to international collaborations. The C. diphtheriae MLST database can be accessed at http://pubmlst.org/cdiphtheriae.
This is the first use of MLST to characterize isolates of C. diphtheriae. MLST effectively typed 150 diverse C. diphtheriae isolates and confirmed findings of previous studies indicating that there is significant intraspecies genetic diversity. The data presented demonstrate that recombination has played a role in the evolution of C. diphtheriae. This was made evident by splits decomposition analysis and in the significant discordance observed between all MLST gene trees. However, since the congruence was deemed to show no greater similarity than that to trees of random topology within only two MLST loci, genetic exchange does not obscure all phylogenetic signals. Analysis of the complete genome sequence of C. diphtheriae reveals recent acquisition of pathogenicity factors. The observed recombination in this study highlights an obvious opportunity for these and other determinants to move across the population of C. diphtheriae.
To validate the accuracy and discriminatory power of the MLST scheme, the data were compared to an available subset of strains typed by the current gold standard, ribotyping. The MLST data were generally in concordance with the ribotyping findings. However, MLST provided greater strain resolution in two instances and ribotyping in one. MLST identified an SLV within three Sankt-Peterburg ribotype isolates and was able to distinguish between two ribotype Lyon isolates, which De Zoysa et al. had previously distinguished (5). As with AFLP, PFGE, and RAPD studies (3, 4, 5), MLST was unable to differentiate between the two predominant ribotypes associated with the FSU outbreak: Sankt-Peterburg and Rossija. Spoligotyping studies of this epidemic clonal group showed a clear divergence between these two ribotypes and suggested that the Rossija ribotype may have originated from one particular subpopulation of ribotype Sankt-Peterburg (21, 22). Likewise, it is clear by MLST that the Vladimir and Lyon ribotypes are clonally derived (eBURST group 5), as are the Cluj and Gatchina ribotypes (eBURST group 11).
MLST also identified a clonal complex (eBURST group 9) associated with an epidemiologically linked diphtheria outbreak in Haiti and the Dominican Republic, which both share the island of Hispaniola. A suspected diphtheria outbreak is believed to have originated in the Fond des Blancs region of Haiti in 2004 (CDC, personal communication). From 2001 to 2003, 13 diphtheria cases in Haiti and 120 in the Dominican Republic were reported (www.who.int/immunization_monitoring/data/en/). This increased significantly, to 253 cases in Haiti and 177 in the Dominican Republic, from 2004 to 2006 (see above URL). Of 28 isolates obtained during the outbreak, 93% belonged to a clonal complex comprising ST-31 and ST-4 isolates (eBURST group 9). Since two further isolates, collected in 2000 and 1995, were ST-31 and ST-4, respectively, it is evident that the outbreak strains were circulating in the preepidemic period and may belong to the regions of endemicity for a C. diphtheriae reservoir.
While members of some clonal complexes were globally distributed, other groups were associated with specific locations. Notably, members of eBURST group 8 were obtained from geographically disparate countries (Thailand, Russia, and Guatemala), whereas exclusively Russian isolates belonged to eBURST group 10. However, these findings may be the result of sampling limitations and require a wider sample analysis.
It was not possible to identify a definitive association between ST and toxigenicity for biotypes mitis, intermedius, and gravis. Likewise, there was not always a clear association between biotype and ST, indicating that biotypes are not necessarily stable epidemiological markers, which is wholly consistent with C. diphtheriae being identified in this work as having only a weakly clonal structure. However, isolates identified by biotype and nontoxigenic status as typical belfanti strains clustered together within lineage II.
MLST provides a valuable tool for monitoring and characterizing endemic and epidemic C. diphtheriae strains. The data produced are portable, reproducible, and unambiguous. Strain discrimination was in accordance with ribotyping data, and clonal complexes associated with disease outbreaks were identified.
We thank Keith Jolley for his assistance with the gene tree congruence data and for setting up the MLST database.
We thank the BBSRC, Micropathology LTD, and Society for General Microbiology for financial support at Warwick University. We also thank the Centers for Disease Control and the Institut Pasteur Foundation, which currently curates the C. diphtheriae MLST database (http://pubmlst.org/cdiphtheriae/).
Published ahead of print on 15 September 2010.