|Home | About | Journals | Submit | Contact Us | Français|
The genomic fine-typing of strains of Mycobacterium ulcerans, the causative agent of the emerging human disease Buruli ulcer, is difficult due to the clonal population structure of geographical lineages. Although large sequence polymorphisms (LSPs) resulted in the clustering of patient isolates originating from across the globe, differentiation of strains within continents using conventional typing methods is very limited. In this study, we analyzed M. ulcerans LSP haplotype-specific insertion sequence elements among 83 M. ulcerans strains and identified single nucleotide polymorphisms (SNPs) that differentiate between regional strains. This is the first genetic discrimination based on SNPs of M. ulcerans strains from African countries where Buruli ulcer is endemic, resulting in the highest geographic resolution of genotyping so far. The findings support the concept of genome-wide SNP analyses as tools to study the epidemiology and evolution of M. ulcerans at a local level.
Mycobacterium ulcerans causes the devastating cutaneous disease Buruli ulcer (BU). More than 30 countries worldwide have reported this emerging disease, reaching epidemic proportions in some areas, and children between the ages of 5 and 15 in the rural wetlands of West Africa are most affected (38). Although proximity to marshes and wetlands is a risk factor, the mode of transmission remains an enigma (10, 26, 27, 36, 37). Discrimination of genetic variants has become an indispensable tool to unravel the evolution, epidemiology, and transmission of pathogenic organisms and to gain insight into host-pathogen interactions (6, 13, 24). In M. ulcerans, such elucidation is impossible due to a remarkable lack of genetic diversity on a local geographic scale (22). Conventional genetic differentiation tools commonly used for phylogenetic profiling in Mycobacterium tuberculosis, such as restriction fragment length polymorphism, amplified fragment length polymorphism, variable-number tandem repeats (VNTR), and multilocus sequence typing, could distinguish between continental lineages only when applied to M. ulcerans (1-4, 7, 8, 15, 18, 29, 31, 34, 35). However, two publications using VNTRs reported the first discrimination of strains between and within African countries (16, 33). The identification of regions of difference (RDs) in a worldwide collection of M. ulcerans isolates led to an evolutionary scheme on the continental level, with two distinct genetic lineages that can be subgrouped into six haplotypes (20, 28). Strains of the “ancestral” lineage are genetically closer to Mycobacterium marinum, the progenitor of M. ulcerans, whereas the “classical” lineage accounts for the majority of BU cases and represents the most virulent genotype. Characterization of the large sequence polymorphisms (LSPs) showed that insertion sequence (IS) element (ISE) expansion is associated with the observed genome instability (17, 19, 40). ISs are compact mobile DNA segments capable of inserting at multiple sites in a target molecule, usually by a recombinase that is encoded by a coding sequence (CDS) contained within the ISE itself (23). Thus, uncontrolled duplications and insertions of ISEs occur at relatively high frequency in replicating bacteria, leading to genomic insertions, deletions, and rearrangements that have the potential for molecular epidemiological applications. In M. tuberculosis, until recently, IS-mediated insertions/deletions (InDels) used to be the principal source of genome plasticity (6) and are widely used as evolutionary markers in epidemiological studies. For M. ulcerans, two ISEs were defined, IS2404 and IS2606 (30, 32). Earlier, site-specific IS2404 elements were identified to be unique for and confined to distinct M. ulcerans haplotypes (19). Here, we specifically amplified such unique ISEs and compared their sequences for a collection of 83 M. ulcerans isolates including 67 derived from Africa. We aimed at the detection of single nucleotide polymorphisms (SNPs) in these ISEs that made genetic distinction within haplotypes and on a regional level possible.
Isolates used for SNP identification with their country origin are listed in Table Table11 .
Genomic DNA from clinical and environmental isolates was extracted from bacterial pellets using an optimized method for mycobacterial DNA preparation (21). Bacterial pellets of about 20 mg (wet weight) were heat inactivated for 1 h at 95°C, followed by cell wall disruption and digestion. DNA was extracted from the supernatants by phenol-chloroform (Fluka, Buchs, Switzerland) extraction and subjected to ethanol precipitation as described previously (21). DNA was measured by the optical density at 260 nm using a NanoDrop 1000 spectrophotometer (Thermo Fisher, Waltham, MA).
Versions of IS2404 were selected that are haplotype specific. For the African/Australian haplotypes, the primer pair MK323 (GCGGTACAAGCTTCCCAAAG) and MK814 (AGCCAGAGCTTTGGATTTGA) was applied to yield a PCR product of 2 kb comprising IS2404 (MUL_3871) in RD12, and the pair MK809 (GGTGCTTAACGAAACGTGCT) and MK808 (ACGAAATCGAATTCCTCGTG) was used to yield a PCR product of 2 kb comprising IS2404 (MUL_2990) in RD1. Primers MK808 and MK809 amplified a 360-bp PCR fragment of glnA3 lacking IS2404 in the South American and Asian haplotypes. The primer pair MK382 (GATCCTCGATCCGGTGTTC) and MK410 (GGATCTCCACCTTCGTCAAC) amplified a specific IS2404 element within RD9 confined to the South American haplotype, and primer pair MK892 (GCAATGTGATGCACAACCTC) and MK650 (CGTTCGATTTCACCTCACC) amplified a specific IS2404 element within RD11 unique for the Asian haplotype. Sequencing of the respective PCR products was done using the primers used for the PCR and the IS2404-specific internal primers MK661 (GATTGGTGCTCGGTCAACTC), MK662 (TCAGGTAGTGCGACTTCAAGG), MK663 CAGCGTGGAGGTGGTCTATG), and SR685 (AGGCCAACACATCGAGAAAC) to cover the entire amplicon. PCR was performed using the FirePol 10× BD buffer and 0.5 μl of FirePol Taq polymerase (Solis BioDyne, Tartu, Estonia) with 5 ng of genomic DNA, 0.6 μM (each) of forward and reverse primer, 1.7 mM MgCl2, and a 0.3 mM concentration of each deoxynucleoside triphosphate in a total volume of 30 μl. PCRs were run in a GeneAmp PCR System 9700 PCR machine. The thermal profile for PCR amplification of Escherichia coli plasmids and M. ulcerans genomic DNA included an initial denaturation step of 95 to 98°C for 3 min, followed by 32 cycles of 95°C for 20 s, annealing at 58 to 65°C for 20 s, and elongation at 72°C for 30 s up to 2 min. The PCRs were finalized by an extension step at 72°C for 10 min. PCR products were analyzed on 1% agarose gels by gel electrophoresis using ethidium bromide staining and an AlphaImager illuminator (Alpha Innotech, San Leandro, CA). PCR amplicons were purified using a NucleoSpin purification kit (Machery-Nagel, Düren, Germany) and subjected to direct sequencing by Macrogen, Seoul, South Korea. Primers (Sigma-Aldrich, Steinheim, Germany) were designed using Primer3 software, version 0.4.0 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi). All sequences were subjected to multiple sequence alignments using the ClustalW2 tool of the European Molecular Biology Library-European Bioinformatics Institute (http://www.ebi.ac.uk/Tools/clustalw2/index.html) for phylogenetic analysis.
We selected copies of four IS2404 elements that were earlier identified to be confined to one specific haplotype only (19). In RD9, an IS2404 element was inserted in between orthologues of the partial CDSs, MMAR_3539 and MMAR_3559, in only the South American haplotype (Fig. (Fig.1).1). In RD11, an IS2404 element was inserted in between the orthologues of the CDSs, MMAR_2557 and MMAR_2563 in only the Asian haplotype (Fig. (Fig.1).1). The two IS2404 elements in RD1 (MUL_2990) and RD12 (MUL_3871) were confined to the classical lineage (Fig. (Fig.1),1), represented by the African/Australian isolates. Primers flanking the respective ISEs and specific for the haplotype-specific constellation of the respective RDs were used to specifically amplify these four IS2404 element-containing loci without contamination by the many other copies of ISEs present in the M. ulcerans genomes. The RD9-associated IS2404 was amplified for two South American strains, and the RD11-associated IS2404 was amplified for two Asian strains, each belonging to the ancestral lineage. The two IS2404 versions associated with RD1 and RD12 were amplified for 79 M. ulcerans strains belonging to the classical lineage (Fig. (Fig.1).1). Fifty-four of these strains originated from Ghana, and 13 more were from other West and Central African countries; 12 strains derived from Australia, Malaysia, and Papua New Guinea. The chromosomal context within each haplotype was identical for all strains (data not shown), i.e., the nucleotide composition at the breakpoints, the adjacent regions, and the deleted DNA stretches associated with the ISE insertion (in RD9 and RD11). The IS2404 elements were between 1,362 and 1,367 bp long.
Within the South American haplotype, the patient isolate originating from Surinam could be distinguished by 11 SNPs (including one gap) from the one from French Guiana (Fig. (Fig.1).1). The two strains from Japan and China, belonging to the Asian haplotype, could be distinguished by eight SNPs (including one gap) from each other (Fig. (Fig.1).1). Within the African/Australian haplotype for the two RD9- and RD11-associated ISEs, altogether 72 variable nucleotide positions were found that cluster the 79 strains into 11 groups that we designated ISE-SNP types (Fig. (Fig.1).1). When these SNPs are linked to epidemiological data (Fig. (Fig.2),2), we found that the West African region from the Ivory Coast to Togo harbors a mixture of ISE-SNP types. All four patient isolates from Benin showed identical nucleotide sequences defined by one common SNP (T548C). This polymorphism is shared by the environmental strain BEN001441 (25). We found four genotypes represented within southern Ghana, with a majority having ISE-SNP type 2. The ISE-SNP type 1 cluster contains the sequenced reference strain Agy99 along with another strain from Ghana as well as isolates from the Democratic Republic of Congo, the Ivory Coast, and Angola. ISE-SNP type 3 from within the Greater Accra region in Ghana comprises four identical strains that differ in one SNP (T-A) from other M. ulcerans isolates of the same area. Two strains from Ghana and one from the Ivory Coast (forming ISE-SNP type 7) are, with respect to the ISE-SNP type, quite distinct from the remainder of the African isolates but were found in closer genetic proximity to strains from Southeast Asia and Australia (ISE-SNP types 8 to 11) (Fig. (Fig.2).2). Within the Australian isolates, the insertion of 18 nucleotides in five strains (Fig. (Fig.1,1, ISE-SNP types 9 and 10) has probably emerged through homologous recombination of an internal part of another IS2404 fragment in a common progenitor, grouping these strains originating from Papua New Guinea, Australia, and Malaysia together. However, other M. ulcerans isolates from southern Australia, i.e., Victoria (ISE-SNP type 11), do not have this small insert, resulting in two different clusters within Australia.
Here, we defined ISE-SNP types in Africa that seem to be either geographically clustered (e.g., ISE-SNP types 5 and 6) or more widespread (e.g., ISE-SNP types 1 and 7). These genotypes unveil a clearer picture of M. ulcerans dispersal and epidemiology in Africa and on a worldwide scale.
Characterization of InDel diversity among a worldwide collection of M. ulcerans strains by comparative genomic hybridization analysis (20) has yielded markers for the investigation of the phylogeography of M. ulcerans patient isolates on a global scale. Continental haplotypes with unique constellations in particular RDs were defined (19, 20). Here, we combine the strength of lineage-specific unequivocal genetic InDel markers with the high-resolution power of SNPs. We now determined the nucleotide sequence of RD-associated haplotype-specific copies of IS2404 and identified SNPs, allowing further subdivision of continental lineages. In particular, within the classical lineage, sequence analysis of the RD9- and RD11-associated IS2404 elements yielded 11 SNP types (ISE-SNP types) across a panel of 79 M. ulcerans strains. Since the two selected ISEs are identical in their chromosomal context across the tested classical lineage strains, the haplotype-specific insertions in RD9 and RD11 must have occurred in a common ancestor, and accumulation of SNPs represents secondary events. Since IS2404 is highly redundant in M. ulcerans, the occurrence of point mutations, whether synonymous or nonsynonymous in nature, is irrelevant for the microbe's biology.
The resolution of ISE-SNP typing is higher than that achieved with other DNA fingerprinting techniques: ISE-SNP types correlated with the more limited VNTR/ mycobacterial interspersed repetitive unit-VNTR fine-typing and, in particular, enhanced the resolution within the Atlantic African genotype (16, 33). Some ISE-SNP types seem to be widespread across West African countries (e.g., ISE-SNP types 1, 2, 3, and 7 in the Ivory Coast, Ghana, Togo, the Democratic Republic of the Congo, and Angola). Others appear more delimited (such as ISE-SNP type 5 in Benin and types 4 and 6 in the Democratic Republic of the Congo). Among M. ulcerans isolates from Ghana, four ISE-SNP types (1, 2, 3, and 7) were identified. The retrieved phylogenetic tree (Fig. (Fig.2)2) depicts the highest resolution of M. ulcerans phylogeny within and between continents. The ISE-SNP type analysis revealed genetic relatedness of a subgroup of African strains (ISE-SNP type 7 from Ghana and the Ivory Coast) to the Southeast Asian/Australian clusters (ISE-SNP types 8 through 11). The latter indicates the possible link of an origin from common ancestors of ISE-SNP type 7 to Australian M. ulcerans strains. Interestingly, the only M. ulcerans isolate ever cultivated from the environment and originating from Benin (25) showed the same ISE-SNP type as the patient isolates coming from the same country, supporting the current hypothesis that infection with BU disease results from environmental exposure.
Within the M. tuberculosis complex, and even within M. ulcerans-related mycolactone-producing mycobacteria, analysis of LSPs represents a valuable approach for genetic fingerprinting (5, 6, 12, 18). However, with the increasing availability of multiple whole-genome sequences, SNP identification adds considerably to phylogeographic analyses (11, 13, 14). We conclude that also for M. ulcerans, SNP typing rather than analysis of LSPs will yield sufficient resolution for microepidemiological studies. The resolution obtained here for the classical lineage is thus far based on only two copies of IS2404. Analysis of a larger number of ISE copies or of the entire genome of a collection of isolates may yield a large enough number of SNPs to resolve the spatial and temporal dispersal of genetic M. ulcerans variants on the regional level.
A subsequent study that applied next-generation sequencing to two additional genomes of M. ulcerans strains from Ghana confirmed our conclusion in revealing 68 SNP loci that led to the differentiation of a collection of 54 strains from this region of endimicity into 13 SNP haplotypes (W. Qi, M. Käser, K. Röltgen, D. Yeboah-Manu, and G. Pluschke, PLoS Pathog. 5:e1000580, 2009).
This research activity was part of the Stop Buruli initiative funded by the UBS Optimus Foundation, Switzerland.
We thank Konstantina Boutsika for technical assistance.
Published ahead of print on 2 September 2009.