|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: MK. Performed the experiments: MK. Analyzed the data: MK. Wrote the paper: MK GP. Supervised the project: GP.
Based on large genomic sequence polymorphisms, several haplotypes belonging to two major lineages of the human pathogen Mycobacterium ulcerans could be distinguished among patient isolates from various geographic origins. However, the biological relevance of insertional/deletional diversity is not understood.
Using comparative genomics, we have investigated the genes located in regions of difference recently identified by DNA microarray based hybridisation analysis. The analysed regions of difference comprise ~7% of the entire M. ulcerans genome.
Several different mechanisms leading to loss of functional genes were identified, ranging from pseudogenization, caused by frame shift mutations or mobile genetic element interspersing, to large sequence polymorphisms. Four hot spot regions for genetic instability were unveiled. Altogether, 229 coding sequences were found to be differentially inactivated, constituting a repertoire of coding sequence variation in the rather monomorphic M. ulcerans.
The differential gene inactivation patterns associated with the M. ulcerans haplotypes identified candidate genes that may confer enhanced adaptation upon ablation of expression. A number of gene conversions confined to the classical lineage may contribute to particular virulence of this group comprising isolates from Africa and Australia. Identification of this spectrum of anti-virulence gene candidates expands our understanding of the pathogenicity and ecology of the emerging infectious disease Buruli ulcer.
The emerging human disease Buruli ulcer, caused by Mycobacterium ulcerans, is of increasing challenge for public health systems in many countries, mainly in West and Central sub-Saharan Africa. Genetic differentiation of patient isolates, a prerequisite for scientific studies on and intervention of disease transmission and dispersal, is hampered by an exceptional lack of genetic diversity within this species. Comparative genomics on M. ulcerans of worldwide geographical origin has already allowed for distinguishing several haplotypes separated into two distinct lineages. Differences in prevalence and incidence of Buruli ulcer were already suspected, but biological relevance for this was unclear. Here, we show newly identified hot spot regions of genomic instability, a biased silencing of coding sequences belonging to distinct functional groups, and a differential gene repertoire across M. ulcerans strains. Gene inactivation mediated by different mechanisms in M. ulcerans adds to the concept of anti-virulence genes observed in an increasing number of bacterial species. According to this concept, loss of such genes—in addition to gain of function—may confer a selective advantage for a pathogen radiating into a new niche. In the case of M. ulcerans, a distinct set of disrupted genes may enhance virulence, particularly in the classical lineage.
Mycobacterium ulcerans is the etiologic agent of the emerging human disease Buruli ulcer, the third most common mycobacterial disease which occurs in more than 30 countries. It is associated with necrosis of subcutaneous tissues, mainly in the extremities of children, and often leads to severe disability. Due to an exceptional lack of genetic diversity in M. ulcerans genetic fingerprinting methods for studies on disease transmission are currently not available –. In M. tuberculosis, single nucleotide polymorphisms (SNPs) and large sequence polymorphisms (LSPs) are used to investigate global dissemination and to rapidly track transmission pathways –. Earlier, we have identified regions of difference (RDs) between M. ulcerans patient isolates originating from different geographical areas . These genomic variations caused by deletions, combined insertions/deletions (InDels), insertions of mobile insertion sequence elements (ISEs), and genome rearrangements proved useful genetic markers for phylogenetic analyses . There is evidence that the most recent common ancestor of M. ulcerans has developed from the fish pathogen M. marinum – for which a whole genome sequence was recently completed . We have identified six InDel haplotypes that can be grouped into two distinct lineages: the ancestral lineage comprising the haplotypes from Asia, South America, and Mexico, that is genetically closer to M. marinum in RD composition, and the classical lineage comprising the haplotypes originating from Africa, Australia, and South East Asia ,. Although the number of Buruli ulcer cases may be largely underestimated in some of the endemic countries, the main prevalence is in West-Africa . The continental distribution of severe disease focussing on West-Africa and Australia correlates with the presence of the M. ulcerans classical lineage, which is increasingly suspected to be more pathogenic than the ancestral lineage ,,.
The major virulence determinant of M. ulcerans is the immunosuppressive and cytotoxic macrolide toxin, mycolactone, produced by enzymes encoded by the virulence plasmid, pMUM001 ,. In addition to such gain-of-function pathogenic factors, virulence can also be determined by genes that confer enhanced adaptation upon loss of their function, since their expression is detrimental for a pathogen radiating into new niches. Such factors, designated anti-virulence genes ,, are being identified for an increasing number of prokaryotic pathogens (e.g. –) including M. tuberculosis –. Orthologues of CDSs that are essential for pathogenicity in M. tuberculosis, such as members of the ESX-1 secretion apparatus and α-crystallin-like protein (HspX), were recently shown to be differentially affected by gene inactivation between the haplotypes of M. ulcerans, probably for reasons of evasion from the hosts' immune system .
In this report, we provide a detailed description of RDs among the otherwise genetically monomorphic M. ulcerans patient isolates of world-wide origin, covering ~7% of the whole genome and comprising 338 coding sequences (CDSs). First, this comprehensive comparison led to the identification of a set of genes that were differentially inactivated across M. ulcerans haplotypes. Second, this differential gene repertoire may have implications for lineage specific differences in ecology and virulence of M. ulcerans and the predominant prevalence of Buruli ulcer in West-Africa and Australia. We hypothesize that, in addition to the acquisition of the plasmid, comprising the mycolactone encoding gene cluster, loss of distinct anti-virulence genes was important for the development of a highly virulent lineage of mycolactone producing mycobacteria.
M. ulcerans strains isolated from lesions of human Buruli ulcer patients used in this study are as follows (for a more detailed description see ). For the classical lineage: Ghana IFIK 1066089 (this study), Ghana Agy99, Ghana ITM 970321, Ghana ITM 970359, Ghana ITM 970483, Ivory Coast ITM 940662, Ivory Coast ITM 940815, Ivory Coast ITM 940511, Benin ITM 970111, Benin ITM 940886, Benin ITM 940512, Benin ITM 970104, Democratic Republic of Congo (DRC) ITM 5150, DRC ITM 5151, DRC ITM 5155, Togo ITM 970680, Angola ITM 960657, Angola ITM 960658, Papua New Guinea (PNG) ITM 941331, PNG ITM 9537, Malaysia ITM 941328, Australia ITM 941324, Australia ITM 941325, Australia ITM 941327, Australia ITM 9549, Australia ITM 9550, Australia ITM 8849, Australia ITM 940339, Australia ITM 5142, and Australia ITM 5147. For the ancestral lineage: China ITM 980912, Japan ITM 8756, French Guiana ITM 7922, Surinam ITM 842, Mexico ITM 5114, and Mexico ITM 5143. The clinical isolate M. marinum strain M (ATCC BAA-535) was used for interspecies comparison.
Bacterial pellets of about 60 mg (wet weight) were heat inactivated for 1 hour at 95°C in 500 µl extraction buffer (50 mM Tris-HCl, 25 mM EDTA, 5% monosodium glutamate), and sequentially treated with lysozyme (2 h, 37°C, 17 M lysozyme) and proteinase K (overnight, 45°C, 0.3 M proteinase K in proteinase K buffer: 1 mM Tris-HCl, 5 mM EDTA, 0.05% SDS, pH7.8). After digestion, the samples were subjected to bead beater treatment (7 min, 3000 rpm, Mikro-Dismembrator, B. Braun Biotech International, Melsungen, Germany) with 300 µl of 0.1 mm zirconia beads (BioSpec Products, Bartlesville, OK, USA). DNA was extracted from the supernatants by phenol-chloroform (Fluka, Buchs, Switzerland) extraction and subjected to ethanol precipitation. DNA concentration was measured by optical density at 260 nm (GeneQuant spectrophotometer, Pharmacia Biotech, Cambridge, UK).
PCR was performed using FirePol 10× BD buffer and 0.5 µl FirePolTaq-Polymerase (Solis BioDyne, Tartu, Estonia), 5 ng genomic DNA or the corresponding volume of RNAse free water as a negative control, 0.6 µM forward and reverse primers each, 1.7 mM MgCl2 and 0.3 mM of each dNTP in a total volume of 30 µl. Long-range PCR polymerase mix (Fermentas, St. Leon-Rot, Germany) was applied according to the manufacturer's protocol to retrieve PCR products longer than 3 kb and up to 8 kb. PCR reactions were run in a GeneAmp PCR System 9700 PCR machine. The thermal profile for PCR amplification of M. ulcerans genomic DNA included an initial denaturation step of 95–98°C for 3 min, followed by 32 cycles of 95°C for 20 sec, annealing at 58–65°C for 20 sec, and elongation at 72°C for 30 sec up to 4 min. The PCR reactions were finalized by an extension step at 72°C for 10 min. PCR products were analyzed on 1–2% agarose gels by gel electrophoresis using ethidium bromide staining and the AlphaImager illuminator (Alpha Innotech, San Leandro, CA, USA). PCRs fragments produced for analysis of unknown genomic sequences were purified using the NucleoSpin purification kit (Machery-Nagel, Düren, Germany) and subjected to direct sequencing or cloned using the TOPO TA Cloning Kit (Invitrogen Corporation, Carlsbad, CA, USA), transformed into JM109 (Sigma-Aldrich, Buchs, Switzerland) bacterial cells, and sequenced after DNA preparation (Miniprep-Kit, Sigma-Aldrich, Buchs, Switzerland). Sequencing was performed using the Big Dye kit and the AbiPrism310 genetic sequence analyzer (Perkin-Elmer, Waltham, MA, USA). Primers (Sigma-Aldrich, Steinheim, Germany) were selected on the genome sequences of M. ulcerans Agy99 (Genbank accession number CP000325) and M. marinum M (Genbank accession number CP000854 and CP000895) using the Primer3 software (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) and, for unknown regions, combined with outward directed primers corresponding to sequences within the IS2404 and IS2606 elements.
Primers (Sigma-Aldrich, Steinheim, Germany) and TaqMan probes (Biomers, Ulm, Germany) were designed using the Primer Express software version 2.0 (Applied Biosystems, Foster City, CA, USA), probes were 5′ labeled each with fluorescent dye, FAM, and 3′ labeled with the quencher, TAMRA. Primers and probes targeted M. ulcerans Agy99 sequences of IS2404 (IS2404cf AAAGCACCACGCAGCATCTT, IS2404cr AGCGACCCCAGTGGATTG, and IS2404cp FAM-CCGTCCAACGCGATCGGCA-TAMRA), IS2606 (IS2606f TGCTGACGGAGTTGAAAAACC, IS2606r CCTTTGAGGCCGTCACAGA, and IS2606p FAM-CGGCGTGGCCGACATCTTCTTC-TAMRA), and GroEL (GroELf CCTGCTGAGCGTCGAAGTC, GroELr GGGCACCGAGCTGGAGTT, and GroELp FAM-CCGAGAGGTATCCCTTGTCGAAACCG-TAMRA). Real-time PCR mixtures contained 50 fg of template DNA, 900 nM of TaqMan probe and 300 nM of each primer, and TaqMan Universal PCR Master Mix (Applied Biosystems, Foster City, CA, USA) in a total volume of 25 µl. Amplification and signal detection were performed using the 7500 Real Time PCR System (Applied Biosystems, Foster City, CA, USA) at the following conditions: 1 cycle of 50°C for 2 min, 1 cycle of 95°C for 10 min, 40 cycles of 95°C for 15 s and 60°C for 1 min. Quantitative TaqMan real-time PCR CT values for the ISEs were normalized by detection of the single copy GroEL target sequence. Samples were repeated at least twice and negative controls were included in each assay. The estimated difference in mean CT values between the lineages was calculated together with the 95% confidence interval (CI).
For four InDel haplotypes the following strains were used as representatives: Ghana IFIK 1066089 and Ghana 970359; Australia 941324 and Australia 940339; China 980912 and Japan 8756; French Guyana 7922 and Surinam 842; Mexico 5143 and Mexico 5114 . The two haplotypes within the classical lineage, Australia 5142/47 and Australia 9549 ,, differed only in one InDel each from Agy99 and thus were excluded from the RD description. DNA sequences were retrieved using a combination of genome sequence scanning, primer walking, and sequence gap bridging, as described earlier . Sequences were aligned to the recently published M. marinum M genome  for absence or presence of CDSs. Comparative in silico sequence analysis was performed using the sequence manipulation suite (http://bioinformatics.org/sms/index.html), the sequence alignment tool blast 2 sequences (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), the multiple sequence alignment website Multalin (http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html), the Artemis software release 9 , and the Artemis Comparison Tool software release 6 .
Within the analysed 7% of the entire M. ulcerans genome associated with RDs1 to 15 we observed various genetic mechanisms that led to specific ablation of the expression of sets of proteins across the six haplotypes: i) frameshift mutations resulting in pseudogenization, ii) interspersing of ISEs into CDS that led to their disruption, and iii) physical deletions of sizes between 2 and 53 kbp with replacement by ISEs which made their involvement obvious. Both pseudogenization or functional disruption, leaving the CDSs with scars in the genome, and physical deletion of the CDSs lead to gene silencing. Throughout the RDs, there is a strong bias of the two M. ulcerans lineages in their mechanisms leading to gene loss : in the ancestral lineage deletions of large DNA stretches play a major role, whereas the classical lineage shows preponderance of ISEs interrupting CDSs, often even without concurrent deletions, as shown for RD1 in Fig. 1. Although a sequence of events cannot be deduced for RD1 from Fig. 1, it is clear that the inactivation of MMAR_2766, involved in lipid metabolism, was mediated by independent InDel events in the two lineages. In the ancestral lineage, five additional genes were lost with the 8 kb deletion whereas only in the classical lineage interspersing of an IS2404 element into glnA3 led to its functional disruption (Fig. 1). Thus, independent InDel events have led to a differential gene repertoire between the two lineages. Fig. 2 gives a comprehensive reference overview of all genome variations in the identified RDs1 through 15 and shows a variety of such events. A detailed list of the differentially deleted genes, corresponding to Fig. 2, is provided in the Table S1.
RDs 1 through 15 are evenly distributed on the genome as shown in Fig. 2. An overlay of positions of both ISEs and RDs (Fig. 3A) for the whole genome sequences of M. marinum M and M. ulcerans Agy99 shows that most RDs are associated with the presence of ISEs. Comparison of the two M. ulcerans lineages throughout RDs1 to 15 revealed a difference in ISE abundance (Fig. 2 and and3B),3B), and Southern hybridization of representatives of the two lineages already indicated significant differences for IS2606 . We therefore compared the number of whole genome IS2404 and IS2606 copies by quantitative real-time PCR (Fig. 3B, C). The estimated mean difference between the classical and ancestral lineage for IS2404 signals was 1.66 (95% CI=0.64 to 2.68), indicating that the pronounced difference in abundance of IS2404 between the two lineages was largely restricted to the analysed RDs. However, for IS2606 an elevated CT value (27.24) was measured in the ancestral lineage resulting in an estimated mean difference between the lineages of 6.34 (95% CI=4.87 to 7.81). This reflects a very low abundance of IS2606 in the whole genome of strains of the ancestral lineage, explaining the observed lack of IS2606 involvement in genome rearrangements in this lineage.
The investigated RDs comprise in their ~400 kbp DNA sequence 338 genes with respect to the M. marinum M sequence. Altogether 229 genes were found to be affected by differential inactivation. While a number of these genes was lost or inactivated only in one of the haplotypes (32 in the classical lineage), a large fraction (156) of the genes were silenced by independent events in two or more haplotypes (Fig. 4; for a comprehensive list see Fig. 2 and Table S1). This gene repertoire constitutes a broad spectrum of genomic variation on CDS level in the otherwise genetically monomorphic M. ulcerans. Subdivision of the lost or pseudogenized CDSs into functional protein categories (Fig. 4) showed that i) proteins lost only in the ancestral lineage belong predominantly to the functional categories cell wall/cell processes, lipid metabolism, intermediary metabolism/respiration and regulatory proteins; and ii) for the proteins lost in both lineages the categories virulence/detoxification/adaptation and PE/PPE proteins are overrepresented. When set in relation to the number of genes allocated to the functional categories in the whole genome, over 10% of all virulence/detoxification/adaptation and PE/PPE protein genes have been inactivated in one or both lineages alone in the analysed 7% of the genome (Fig. 4). We identified four regions of preferential genome instability (RDs9, 12, 13, 14) with twelve CDSs that were inactivated by three different events in the haplotypes analysed (Table 1). Seven of these CDSs are coding for proteins likely to interact with the environment/host of the bacterial cells (secreted or membrane proteins and PE/PPE proteins). Three of the CDSs are involved in the mycobacterial ESX-1 secretion apparatus, and embR_1 which is involved in cell wall biosynthesis.
Deletions are unidirectional events that serve as irreversible genetic and evolutionary markers, and their characterisation has repeatedly proven to be a powerful tool for phylogenetic analysis of mycobacteria and studies of their global and regional epidemiology ,,. The described polymorphisms in RDs1 to 15 can be used to distinguish M. ulcerans haplotypes and to position newly identified isolates in the established evolutionary scenario . In the composition of their RDs, M. ulcerans members of the ancestral lineage resemble much more strain M. marinum M than M. ulcerans strains of the classical lineage . Therefore, alignments of their genomic sequence to the M. marinum M sequence provided a clearer picture of the phylogeny than a mere comparison of the sequences of the M. ulcerans lineages. The detailed analysis of the RDs provides a repertoire of genes differentially silenced between the M. ulcerans haplotypes from different geographic origins.
The observed loss of genes supports findings that M. ulcerans lineages are undergoing reductive evolution to become niche-adapted specialists ,. Loss of gene functions under conditions of habitat changes may just be tolerated due to decreased requirement as compared to a generalist ancestor. However, in contrast to such random loss, several observations in this present analysis of 400 kbp of the M. ulcerans genome infer a selective advantage of loss of expression of particular genes: i) the identification of hot spot regions of genome instability, ii) the clustering of silenced genes into functional categories, and iii) the inactivation of a bulk of genes in different haplotypes by independent events that exceeds what is expected by chance alone. Some of these doubly or haplotype specific deleted CDSs might turn out to be patho-adaptive or anti-virulence genes although experimental work has to verify this hypothesis. There is compelling evidence for this to be a real phenomenon from studies in other mycobacteria . For example, mutations at different positions of echA13 (also found in this study) and two other genes among a selection of mycolactone producing mycobacteria already led to the assumption of an independent, purifying selection . Some of the identified gene products in RDs1-15 are likely to influence interaction of mycobacteria with the environment (e.g. members of the PE/PPE protein family and dehydrogenases, in part determining the cell wall lipid composition) or are known antigens in M. tuberculosis (e.g. the esx family proteins, Mpt63, and HspX). As already suspected for M. africanum and M. ulcerans ,, the expression of esxA/esxB and/or HspX may be detrimental in a changing habitat or upon exposure to immune pressure. In hypervirulent strains of M. tuberculosis deletions in metabolic enzymes, cell surface-exposed proteins or regulators that respond to environmental stimuli have been identified –. For example, disruption of the mce1 operon or regulators thereof possibly modulates the host's proinflammatory response and accelerates an immunopathological response in mice ,. Also, an M. tuberculosis orthologue of embR_1, in this study identified as being three times independently disrupted, closely interacts with PknH whose deletion was shown to result in a hypervirulent phenotype ,. Thus, genes listed in Table 2 should be among the first to be investigated for their role in patho-adaptation of M. ulcerans. Interestingly, no orthologues of the differentially silenced CDSs with known function listed in Table 2 are found in the genome sequence of M. leprae TN. After the description of the genome sequence of the African isolate Agy99 , this list of candidate anti-virulence genes constitutes a further step towards the description of the virulome of M. ulcerans.
It was earlier suspected that the ancestral and the classical lineage of M. ulcerans inherit different virulence potential ,, and further evidence for that was provided by a recent study in Peru where Buruli ulcer cases are scarce despite frequent contact of people to M. ulcerans contaminated water bodies . It is conceivable that the pronounced genome contraction that is specific to the classical lineage reflects a particular adaptation of this lineage. In particular, silencing of ten (of 21) CDSs of the functional category cell wall/cell processes and seven (of 18) of the group intermediary metabolism/respiration was confined to the classical lineage (Table 2). They fall either in the category of putative candidates for immune evasion upon loss (e.g. members of the esx gene family, Mpt63, the WcaG-like epimerase MMar_2896) or are of potential regulatory relevance, like hspR_2, a probable heat shock transcriptional repressor. CDSs for PE/PPE proteins were predominantly silenced in the classical lineage within the examined 7% of the M. ulcerans genome. However, when we investigated the strictly ISE-mediated disruptions and deletions in silico in the entire genome we found that ISEs pseudogenized 25% of all PE/PPE genes in strain Agy99 (not shown). The fact that members of this protein family were highly affected by genome shrinkage  suggests a particular importance for reducing expression of such surface exposed proteins.
M. marinum causes only occasionally ulcerative but self-healing infections in humans . Without doubt, the acquisition of the virulence plasmid and the expression of the macrolide toxin mycolactone was an important step in the development of the ancestor of M. ulcerans to a mammalian pathogen . On the other hand, other mycolactone producing mycobacteria closely related to M. marinum and M. ulcerans have been recently isolated from lesions in frogs and fish , but so far not from infected humans. This indicates that additional factors contribute to the high virulence of the classical lineage of M. ulcerans. Our data indicate that, in addition to “gain of function” by acquisition of the virulence plasmid, loss of distinct anti-virulence genes, partly driven by ISE – in particular IS2606 – expansion, might have equipped the classical lineage with a particular virulence and transmissibility (Fig. 5). It would be interesting to experimentally verify this hypothesis by testing these newly identified anti-virulence candidates in an appropriate in vivo model.
CDSs inactivated in RDs1 through 15 across the M. ulcerans haplotypes. CDSs are listed in the order of the M. marinum annotation. Note that only CDSs are listed where M. ulcerans haplotypes differ from each other. Thus, not all MURDs distinguishing the classical lineage from M. marinum in these regions are mentioned but are found elsewhere . All CDSs deleted in more than one haplotype were lost in independent events except when indicated (*=probably not independently deleted). When deleted or pseudogenized, CDSs are indicated in the M. ulcerans Agy99 annotation, where possible, and in the M. marinum M annotation where no M. ulcerans orthologue exists. When found present, respective CDSs are indicated as “present”. CDSs where no M. marinum orthologue exists are indicated “na” (=not applicable). The Mexican haplotype could not be tested for all RDs that affected other haplotypes, as indicated “nd”(=not determined), therefore, the number of CDSs deleted in the Mexican haplotype is underestimated.
(0.04 MB PDF)
We gratefully acknowledge Julia Hauser and Martin Naegeli for excellent technical assistance, A. Ross for support in data analysis, F. Portaels for provision of M. ulcerans strains, P.C. Small for provision of the M. marinum strain M and T. Stinear and J. Parkhill for kindly providing the M. marinum genome annotation.
The authors have declared that no competing interests exist.
M. Käser was supported by a research grant, KA 1842/1-1, from the Deutsche Forschungsgemeinschaft. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.