|Home | About | Journals | Submit | Contact Us | Français|
A preliminary investigation of the genetic biodiversity of Mycobacterium tuberculosis complex strains in Cameroon, a country with a high prevalence of tuberculosis, described a group of closely related M. tuberculosis strains (the Cameroon family) currently responsible for more than 40% of smear-positive pulmonary tuberculosis cases. Here, we used various molecular methods to study the genetic characteristics of this family of strains. Cameroon family M. tuberculosis strains (i) are part of the major genetic group 2 and lack the TbD1 region like other families of epidemic strains, (ii) lack spacers 23, 24, and 25 in their direct repeat (DR) region, (iii) have an identical number of repeats in 8 of 12 variable-number tandem repeats of mycobacterial interspersed repetitive unit (MIRU-VNTR) loci, (iv) have similar IS6110-restriction fragment length polymorphism (RFLP) multiband patterns (10 to 15 copies) with seven common IS6110 bands, (v) do not have an IS6110 element in their DR locus, and (vi) have four IS6110 elements in open reading frames (adenylate cyclase, phospholipase C, moeY, and ATP binding genes). Analysis by spoligotyping, MIRU-VNTR, and IS6110-RFLP typing methods revealed differences not observed in previous studies; polymorphism as assessed by MIRU-VNTR typing was lower than suggested by spoligotyping, and in rare cases, strains with identical IS6110-RFLP patterns had spoligotypes differing by as much as 15 spacers. Our findings confirm the recent expansion of this family in Cameroon and indicate that the interpretation of molecular typing results has to be adapted to the characteristics of the strain population within each setting. The knowledge of this particular genotype, with its large involvement in tuberculosis in Cameroon, allows greater refinement of tuberculosis transmission studies by interpreting data in the context of this geographic area.
Molecular epidemiology methods revolutionized the fields of research, prevention, and control of tuberculosis (TB), allowing the differentiation between strains, assessment of the overall diversity of Mycobacterium tuberculosis complex strains including differences by region and population, and measurement of the prevalence of endemic strains (28). However, few molecular epidemiological studies have been conducted in countries with a high incidence of TB. The available data suggest that families of closely related strains are common in these areas (12). The “Beijing family” is one of the most well-known families, highly prevalent in East Asia and widespread around the world (11).
Molecular analysis based on several variable genomic regions is required for a good definition of strains belonging to different families. Restriction fragment length polymorphism (RFLP) analysis based on the insertion sequence IS6110 results in a unique genotype since both the number of copies of this genetic element and its positions in the genome are variable (25, 26). Precise IS6110 insertion site mapping provides additional information on the fitness of the strain (1) given that IS6110 insertion can modify the expression of the gene involved. Another genetic element useful for characterizing tubercle bacilli is the direct repeat (DR) locus (13), a polymorphic insertion preferential locus (ipl) for IS6110. DR polymorphism can be analyzed by spoligotyping, a method involving PCR-reverse hybridization (14). The DR locus is likely to evolve more slowly than IS6110, making spoligotyping less adequate than IS6110-RFLP for discriminating strains but more convenient for investigating the biogeographic distribution of families of M. tuberculosis complex strains (32). Variable-number tandem repeats, named mycobacterial interspersed repetitive units (MIRU-VNTR), are another type of variable element in the M. tuberculosis complex genome showing extensive polymorphism (17) with a discrimination power close to that of IS6110-RFLP (6). Because of their stability, they can be used for a clear definition of families of tubercle bacilli as well.
Strains of the M. tuberculosis complex exhibit very little genome sequence diversity except in repeat sequences. Consequently, insignificant and rare genome alterations inherited and maintained through long-term evolution can be of phylogenetic value. On the basis of polymorphic nucleotides in katG codon 463 and gyrA codon 95, three genetic groups of M. tuberculosis have been identified (23). Group 1 isolates are evolutionarily the oldest, and after a loss of the region of difference TbD1, the modern lineage further evolved into two branches, group 2 and group 3 (2). Major epidemic strains including the Beijing family belong to the lineage lacking the TbD1 region.
Preliminary investigations of the genetic diversity of M. tuberculosis complex strains from the West region of Cameroon revealed that more than 40% of isolates belong to a closely related group of M. tuberculosis strains. This group was designated the “Cameroon family” (20). All strains of this family lacked spacers 23, 24, and 25 in their DR region and had closely related IS6110 ligation-mediated PCR (LM-PCR) patterns. Here, we report the use of typing methods involving various genetic markers for strains of the “Cameroon family” to more sharply define this group of M. tuberculosis strains.
The analysis was based on 45 M. tuberculosis strains belonging to Cameroon family. These strains shared a spoligotype lacking spacers 23, 24, and 25 and had closely related IS6110 LM-PCR patterns (20). The mycobacteria were killed by heating at 90°C for 20 min, and the genomic DNA from the isolates was prepared by a standard cetyltrimethylammonium bromide-NaCl method (33).
Southern blotting and hybridization with labeled IS6110 DNA were performed as previously described by using an internationally agreed-upon protocol (26). Bionumerics software version 2.5 (Applied Maths, Kortrijk, Belgium) was used to process autoradiographs of Southern blots. IS6110-RFLP patterns were analyzed for similarity with the Dice coefficient with an error of tolerance of 1%, and a dendrogram was constructed by the unweighted pair group method using arithmetic averages (UPGMA).
MIRU loci 2, 4, 10, 16, 20, 23, 24, 26, 27, 31, 39, and 40 were individually amplified and analyzed as previously described (18). Results from each of the 12 loci were combined to form a 12-digit allele profile. MIRU pattern similarity was assessed by Pearson correlation, and a dendrogram was constructed by UPGMA.
Southern blot hybridization with DR as a probe used the membranes blotted with PvuII-digested DNA that had previously been probed with the IS6110 fragment (13). The autoradiographs obtained were superimposed on those obtained with IS6110-RFLP to identify cohybridizing bands and to detect any IS6110 insertion into the DR region.
Primers Ris1 and Ris2 (10), corresponding to the 3′ and 5′ termini of IS6110, respectively, and the 5′-biotinylated Dra reverse primer for standard spoligotyping were used. Eighteen strains were analyzed, representative of all the different spoligotypes of strains previously included in the Cameroon family (20). The PCR products amplified with Ris1-Dra and Ris2-Dra primer pairs were subjected to the standard spoligotyping procedure (14). This procedure allowed amplification and detection of the particular DR spacers to the left and right of the IS6110 copy inserted in the DR region, depending on its orientation towards and within the DR locus (19).
LM-PCR was performed as described previously (21) to obtain the left side of each IS6110 copy with a variable stretch of its flanking sequence. Eleven strains, representative of the 11 different IS6110 LM-PCR patterns of the Cameroon family strains previously described (20), were analyzed. The PCR product was separated on 2% (wt/vol) low-melting-temperature agarose gel (Gibco-BRL Life Technologies). A 100-bp ladder served as the external molecular size marker. Sixteen different bands were excised from the gel and then purified on columns (QIAquick gel extraction kit; QIAGEN S.A., Courtaboeuf, France). The primer IS2 (21), corresponding to the left side of IS6110, was used for direct sequencing of the purified PCR products. The flanking sequences were analyzed by using Tuberculist (http://genolist.pasteur.fr/TubercuList/), a genome browser for M. tuberculosis H37Rv databases, with gapped BLAST analysis to determine the exact insertion site of each IS6110 element.
Twenty-seven different combined patterns were found among the 45 M. tuberculosis strains with the various typing methods (IS6110-RFLP, DR-RFLP, MIRU-VNTR, and spoligotyping) (Fig. (Fig.11).
The IS6110-RFLP analysis revealed 20 different patterns with about 80% similarity (Fig. (Fig.1).1). Twelve isolates had unique patterns, and 33 isolates belonged to eight groups of identical patterns, each containing two to seven strains. The number of IS6110 DNA-containing PvuII fragments was high in all strains, indicating that these strains contain 10 to 15 copies of IS6110. Forty-two strains, that is, all but three (patterns Cam22 and Cam23), shared seven insertion element-containing PvuII fragments, of lengths 7.2, 5, 2.4, 2.2, 2.1, 1.5, and 1.3 kb.
The MIRU-VNTR loci were studied (Fig. (Fig.11 shows the 12-digit designations). Eight MIRU loci did not display variation in their copy numbers (MIRU loci 2, 4, 10, 20, 23, 24, 31, and 39). MIRU loci 16, 26, and 27 contained two alleles each, which differed by only one repeat unit. MIRU locus 40 was also polymorphic and contained three alleles with 1, 3, and 4 repeat units, respectively. At the polymorphic loci, some alleles were more frequent than others. Thus, most of the strains (40 out of 45) contained alleles with three repeat units in MIRU locus 27, and only five strains (patterns Cam11 and Cam12) had one additional copy. In total, eight different MIRU patterns were observed in this set of isolates. One isolate had a unique pattern, and the 44 other isolates fell into seven groups of 2 to 11 strains. MIRU pattern analysis indicates 95% similarity for all the strains, except for one group of five strains (patterns Cam6, Cam7, Cam8, and Cam12) with 85% similarity. These five strains contained only one repeat unit in their MIRU locus 40, whereas the other 40 strains contained three or four units.
Most of strains (42 out of 45) had a single DR-RFLP band, of which there were four variants of different sizes (Fig. (Fig.1).1). The presence of a single DR-RFLP band indicated that IS6110 was not inserted into the DR region of these 42 strains. Two other strains (pattern Cam22) had a two-band DR-RFLP pattern. The superimposition of DR- and IS6110-RFLP films showed that one of these bands at about 5.5 kb hybridized with both the IS6110 probe and the DR probe; thus, there was an IS6110 copy in the DR. One other strain (pattern Cam7) had a three-band DR-RFLP pattern, and two of these bands, at about 7.2 and 4.4 kb, hybridized with both the IS6110 probe and the DR probe, indicating that there were two IS6110 copies in the DR.
We tested whether the insertion of IS6110 copies into the DR region of strains with patterns Cam22 and Cam7 had resulted in hybridization signals in spoligotypes being lost. We compared the three spoligotypes obtained for each pattern with the primer pairs Dra-Drb, Ris1-Dra, and Ris2-Drb (Fig. (Fig.2).2). In the strain with pattern Cam22, Ris2-Dra spoligotyping evidenced spacer 15, a spacer that was absent from the Dra-Drb spoligotype. Thus, spacer 15 was the DR insertion site of one copy of IS6110 as detected by superimposition of IS6110- and DR-RFLP films. In the strain with pattern Cam7, Ris1-Dra and Ris2-Dra spoligotyping detected spacers 31 and 41, neither of which were detected by Dra-Drb spoligotyping. These two spacers were the DR insertion sites of the two IS6110 copies detected by superimposition of IS6110- and DR-RFLP banding patterns. No missed signals were detected for the strains with the other patterns or for spacers 23, 24, and 25, which are deletions characterizing this family of strains.
The classifications obtained by the three different approaches (IS6110, DR region, and MIRU-VNTR) were similar. Overall, the similarity among the strains was more than 70%. The greatest pattern diversity was with IS6110-RFLP analysis, and the lowest diversity was with MIRU typing (Table (Table1).1). Groups of strains shared common patterns for two markers but differed according to the third marker, the third marker being IS6110, MIRU, or DR. Thus, five strains with patterns Cam11 and Cam12 sharing the same spoligotype and IS6110-RFLP patterns had MIRU 40 patterns differing by two repeats. Interestingly, these five strains were the only strains with an additional repeat in MIRU locus 27. Strains with patterns Cam16 and Cam24 shared both IS6110-RFLP and MIRU patterns but had spoligotypes differing at 12 contiguous spacers. Seven strains (pattern Cam27) lacking a block of 15 spacers (spacers 2 to 16) presented identical IS6110-RFLP and MIRU patterns. These observations suggested that these deletions of numerous adjacent direct variable repeat (DVR) sequences in the DR region of this family of M. tuberculosis strains were recent single events rather than a series of sequential deletion events.
We determined the insertion sites of IS6110 corresponding to the six LM-PCR bands shared by all the strains of the Cameroon family (CSIP-1 [Cameroon strain insertion position 1] to CSIP-6) (Table (Table2).2). CSIP-7 to CSIP-15 corresponded to IS6110 copies present in only some of the strains. Six of the 15 sites were in intergenic regions (CSIP-1, CSIP-6, CSIP-7, CSIP-8, CSIP-12, and CSIP-15), and nine were in various open reading frames (ORFs), two of them being in unknown genes (CSIP-9 and CSIP-11). The insertion site CSIP-2 was located in the upstream region of the plcB (phospholipase C) gene, a probable virulence factor implicated in intracellular survival of the bacilli; it may act by altering cell signaling events or by direct cytotoxicity (5, 30). CSIP-3 was an IS6110 insertion site in the moeY gene (Rv1355c) involved in the biosynthesis of molybdoenzymes. The site of insertion CSIP-4 was located 57 bp upstream from Rv3377c, which is a possible cyclase similar to those involved in steroid biosynthesis, intermediary metabolism, and respiration (9). CSIP-5 was located in Rv3179, which contains an ATP-GTP binding site motif. CSIP-10, found only in strains with pattern Cam27, was located in cutI, a gene encoding a probable cutinase-like precursor that is a membrane component (5). Cutinases are typically extracellular enzymes produced by fungi and involved in the breakdown of cutin, the insoluble biopolyester that covers plant surfaces. In pathogenic fungi, cutinase allows penetration through the host plant cutin barrier during the initial stage of fungal infection (3, 27). CSIP-13, found in strains with pattern Cam20, was located in a gene of the PPE family, PPE55, implicated in virulence and in the host immune response (4). Several IS6110 insertion sites were in intergenic regions. CSIP-7 was in the ipl locus, which is part of the IS1547 insertion sequence (Rv3327) (7). It was found in strains with pattern Cam01 to Cam12, Cam15, Cam16, and Cam23. CSIP-12 was located in the 537-bp region between the genes dnaA and dnaN (16). The DnaA protein plays an important role in the initiation and regulation of chromosomal replication. The DnaN protein is a DNA polymerase III, a complex multichain enzyme responsible for most of the replicative synthesis in bacteria. CSIP-12 was found in the strain with pattern Cam18.
The katG and gyrA genes were sequenced, and all the strains were katG463 CGG (Arg) and gyrA95 ACC (Ser). Thus, this family of M. tuberculosis strains is part of genetic group 2, as defined by Sreevatsan et al. (23). The TbD1 region was absent from all the 45 strains.
Cameroon is a country in which the prevalence of TB is high. During the last three decades, the population structure of the tubercle bacilli causing TB has changed substantially in Cameroon. Mycobacteria africanum was previously widespread but is now less common, and a particular clade of M. tuberculosis has expanded and is currently responsible for more than 40% of smear-positive pulmonary TB cases (20). Other dominant M. tuberculosis families have been previously reported in other high-prevalence TB settings, i.e., the Beijing family in East Asia (29) and the F11 genotype M. tuberculosis in Western Cape, South Africa (31). An extensive molecular analysis is necessary to define the specific characters of these predominant families of strains. Such analysis would be an important step in the fight against TB since it can help us to understand why any particular clade is so successful.
In this study, we genetically characterized M. tuberculosis strains of the Cameroon family. Cameroon family M. tuberculosis strains, like Beijing and Haarlem family strains responsible for major epidemics, lack the TbD1 region (2). On the basis of their katG and gyrA gene sequences, they are part of the major genetic group 2, considered to be the principal cause of clustered TB cases (23).
We also analyzed polymorphism in Cameroon family strains by studying three repetitive DNA sequences, DR, MIRU-VNTR, and IS6110. These elements are the most polymorphic of the known markers in the M. tuberculosis complex. The strains lacked spacers 23, 24, and 25 in the DR region as assessed by standard spoligotyping. Using LR spoligotyping, we confirmed that the absence of hybridization was indeed due to the absence of the three spacers and not to the signals being abolished by IS6110 insertion. Other genetic features common to the Cameroon family are an IS6110-RFLP pattern sharing seven IS6110 bands and an identical number of repeats in 8 of 12 MIRU-VNTR loci. Because it is a straightforward technique, spoligotyping has been adopted for initial identification of strains of this genetically close family in Cameroon.
The chromosomal sites of integration of IS6110 can affect pathogenic behavior and drive genome evolution (1). Indeed, IS6110 frequently disrupts coding regions in clinical isolates (22). The mapping of IS6110 in two epidemic strains of M. tuberculosis (1) has suggested that IS6110 is involved in the regulation of gene expression; it is thus interesting to establish the location of each site of insertion in the chromosome. We mapped six IS6110 elements common to all the Cameroon family strains (CSIP-1 to CSIP-6). The following four of these IS6110 elements were in ORFs: (i) adenylate cyclase, involved in energy metabolism (although its role in M. tuberculosis is unclear, it is secreted from other bacterial pathogens such as Bacillus anthracis and invades a variety of eukaryotic cells where it affects both metabolism and immune response) (9); (ii) phospholipase C, a known virulence factor involved in macromolecule metabolism (30); (iii) moeY; and (iv) an ATP binding fragment. Thus, several of the ORFs disrupted in Cameroon family M. tuberculosis strains could be involved in virulence, but their roles, if any, are uncertain. Note that the pathogenesis of M. tuberculosis is not attributable to any single gene product, unlike other bacterial pathogens, where a single gene product can be critical.
We also mapped the other IS6110 elements present in some strains of the Cameroon family (CSIP-7 to CSIP-15). Two of these elements disrupted ORFs thought to be involved in virulence: (i) a PPE family gene and (ii) the gene for a cutinase-like cell wall component; cutinases can hydrolyze fatty acid esters, including mycolic acids, and they may be involved in in vivo survival and virulence (5). However, their role in M. tuberculosis remains unknown. One strain had IS6110 inserted into the 537-bp intergenic region in the origin of replication. Integration of IS6110 in the dnaA-dnaN region has previously been reported only in strains of the major genetic group 1 (16), considered to be the ancestor of groups 2 and 3. We demonstrate that strains other than those of the major genetic group 1 harbor IS6110 in the origin of replication. Overall, we report seven new insertion sites of IS6110 in the M. tuberculosis genome.
The DR locus is an IS6110 ipl in the M. tuberculosis complex genome, and most strains harbor at least one IS6110 copy in this region. The DR may have been the original point of entry for IS6110, and other copies arose from subsequent transposition events. The putative ancestral IS6110 insertion is in DVR 24, at position 15600 in the H37Rv genome sequence (5, 32). Homologous recombination between repeat sequences leading to DVR deletion seems to be the most likely mechanism for the loss of IS6110 from the DR locus (8, 32). The Cameroon family of strains lack spacers 23, 24, and 25 and have no IS6110 elements in the DR region; it is therefore possible that their common ancestor lost the ancestral IS6110 and the two DVRs adjacent to DVR 24 (DVRs 23 and 25); all the other spoligotypes in the family could have subsequently diverged from this common ancestor. The loss of spacers additional to spacers 23, 24, and 25 may have been a recent event in this family because we found strains with identical IS6110-RFLP patterns and spoligotypes differing only in these additional spacers. These observations are consistent with the Cameroon family of M. tuberculosis strains having expanded only recently in Cameroon (20).
Previous studies of the polymorphism of M. tuberculosis complex strains from various areas or from just one geographical setting led to the adoption of certain rules for the interpretation of fingerprints to assess the relationships among the isolates (Table (Table3),3), and these rules are as follows: (i) IS6110-RFLP is the fingerprint method with the highest discriminatory power for M. tuberculosis with five or more IS6110 copies, and secondary typing with another molecular marker cannot differentiate among isolates with identical IS6110-RFLP patterns (15), and (ii) MIRU-VNTR typing has a discriminatory power greater than that of spoligotyping (24). Our study of strains from one country using three molecular markers suggests that different rules for the interpretation of molecular typing results may be appropriate. Although IS6110-RFLP was indeed the most discriminatory method, in some cases spoligotyping or MIRU-VNTR typing differentiated strains with identical IS6110-RFLP patterns. This result may be due to the close genetic relationship of Cameroon family M. tuberculosis strains. Differences in spoligotypes involved blocks of as much as 15 spacers. As they were contiguous, these patterns are presumably the consequences of single genetic deletion events, and thus they have the same significance as concerns time as the loss of a single spacer. Differences in the MIRU-VNTR genotype between strains with identical IS6110-RFLP patterns were limited to a single locus, usually MIRU 40. MIRU 40 was also the most polymorphic of these loci in this study, consistent with previous studies (18). Because the Cameroon family strains probably expanded recently, our observations support the previously suggested faster molecular clock for MIRU 40 (18). MIRUs 16 and 26, two other polymorphic MIRUs in Cameroon family strains, were also previously found among the MIRUs with the highest allelic diversity (18). In contrast, MIRUs 10, 23, and 31, previously reported as highly diverse, seemed “frozen” in Cameroon family strains, whereas MIRU 27 was polymorphic in spite of its very restricted allele distribution in other populations of strains (18). We also found, in contrast with previous studies (6, 24), that MIRU-VNTR typing was less discriminant than spoligotyping. Our study evidences the complexity for interpreting fingerprint data for highly related strains. Interpretation of molecular typing needs to be appropriate to the specific characteristics of the strain population within each setting. To identify strains in epidemiological studies in Cameroon, neither IS6110-RFLP nor MIRU-VNTR typing can be used alone; several markers must be used.
Cameroon family M. tuberculosis strains could have some selective advantage over other M. tuberculosis genotypes present in Cameroon involving virulence, transmissibility, or the ability to interact with the host immune defense system. However, other possibilities to explain this predominance, like geographic confinement, cannot be excluded. In any case, a comprehensive understanding of this particular genotype with a large impact on TB in the country is valuable to allow greater refinement of TB transmission studies by interpreting data in the context of this geographic area.
S.N.N.-E. received a research fellowship from the Agence Universitaire de la Francophonie. This work benefited from the EU Concerted Action Project “New Generation Genetic Markers and Techniques for the Epidemiology and Control of Tuberculosis” (QLK2-CT-2000-630).