Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Int J Tuberc Lung Dis. Author manuscript; available in PMC 2013 March 18.
Published in final edited form as:
PMCID: PMC3600895

Strain classification of Mycobacterium tuberculosis: congruence between large sequence polymorphisms and spoligotypes


Spoligotyping is used in molecular epidemiological studies, and signature patterns have identified strain families. However, homoplasy occurs in the markers used for spoligotyping, which could lead to identical spoligotypes in phylogenetically unrelated strains. We determined the accuracy of strain classification based on spoligotyping using the six large sequence and single nucleotide polymorphisms-defined lineages as a gold standard. Of 919 Mycobacterium tuberculosis isolates, 870 (95%) were classified into a spoligotype family. Strains from a particular spoligotype family belonged to the same lineage. We did not find convergence to the same spoligotype. Spoligotype families appear to be sub-lineages within the main lineages.

Keywords: Mycobacterium tuberculosis, strain classification, genotype, spoligotyping, large sequence polymorphism

HIGHLY DISCRIMINATORY MARKERS are needed for molecular epidemiology studies to track particular strains in the community, and robust markers exhibiting low homoplasy and minimal rates of convergent evolution are needed for phylogenetic and population genetic studies.1

Genotyping using large sequence polymorphisms (LSPs) defines six main lineages of the Mycobacterium tuberculosis complex:2 Euro-American, East-Asian, Indo-Oceanic, East-African-Indian (EAI) and the West-African lineages 1 and 2 (also known as M. africanum). The robustness of LSPs as phylogenetic markers has recently been validated by multilocus DNA sequence analysis.3 There is evidence that these M. tuberculosis lineages may have different phenotypic properties.4 Moreover, associations between host genotypes and M. tuberculosis lineages have been reported, supporting a co-evolutionary scenario for M. tuberculosis and its human host.2

Spoligotyping of M. tuberculosis is used to study tuberculosis transmission.5 It is based on polymorphisms in the direct repeat (DR) locus, which is composed of 36-base pair DR copies interspaced by a non-repetitive spacer sequence. There are 94 spacer sequences, 43 of which are used for spoligotyping. Several signature spoligotyping patterns have been identified such as the Beijing, Cameroon and Central Asia (CAS)/Delhi families.6

We recently showed that the spacers used in spoligotyping exhibit homoplasy (i.e., independent mutational events that result in the loss of the same spacer), which makes spoligotyping an unreliable tool for formal phylogenetic analyses.1 Homoplasy in individual spacers could lead spoligotypes of unrelated strains to converge to identical patterns (i.e., the same spoligotypes may be observed in strains that belong to different evolutionary lineages). Hence, the phylogenetic robustness of the spoligotype families as currently defined is unclear.

In this study, we used LSP and single nucleotide polymorphism (SNP) based genotyping as a gold standard to test the accuracy of strain classification based on spoligotyping, and to determine the frequency of convergent spoligotypes among isolates from different LSP/SNP-based lineages.


We performed a retrospective analysis of genotyping data from a population-based study in San Francisco, approved by the University of California San Francisco, Human Research Protection Program. Spoligotyping was performed using a standardized method7 or the Luminex technology.8 Each spoligotype was assigned to a spoligotype family based on the published definitions (Table).6,9 The LSP/SNP-based genotyping was performed as described previously.2

Definition of the prototype of spoligotype families, the corresponding LSP/SNP-defi ned lineage and their frequency in San Francisco.


Lineage refers to the M. tuberculosis classification based on LSP/SNP. Family refers to the classifications based on spoligotyping. Convergent spoligotypes refer to when isolates with the same spoligotype belong to different lineages defined by LSP/SNP.


We analysed the congruence of the spoligotype families with the LSP/SNP-based lineages using a kappa (κ) correlation. We determined the presence of convergent spoligotypes among isolates that shared the same spoligotype with at least one other isolate.


Of 926 M. tuberculosis isolates with LSP/SNP-based lineage and spoligotyping information available, three cases were excluded because the identifiers were not unique, and four because the patients were infected by more than one strain of M. tuberculosis.

Based on the LSP/SNP from the 919 isolates, 481 (52.3%) belonged to the Euro-American lineage, 235 (25.6%) to the Indo-Oceanic lineage, 198 (21.5%) to the East-Asian lineage and 5 (0.5%) to the EAI lineage. Based on the spoligotype, 870 (95%) isolates belonged to a spoligotype family. The most frequent families were: EAI (n = 211, 23.0%), Beijing (n = 192, 20.9%) and T (n = 182, 19.8%); 49 isolates (5.3%) were not assigned to any family.

The relationship between the lineages and the families is shown in the Table. All the isolates belonging to each of the spoligotype families belonged to the same LSP/SNP-based lineage. Stated differently, the isolates from the Beijing family and those with a spoligotype in which all spacers are present belonged to the East-Asian lineage; the isolates from the T, Haarlem (H), X, Latin-American-Mediterranean (LAM) and S families belonged to the Euro-American lineage; the isolates from the EAI and MANU families to the Indo-Oceanic lineage; and the CAS family to the EAI lineage (Figure). Because spoligotyping is used as a tool for molecular epidemiological studies, it will be possible to use the existing data to infer the main lineages of M. tuberculosis in most of the cases. Such information could be important, given that different lineages of M. tuberculosis have been suggested to have different clinical and epidemiological behaviors,4 and may impact the effectiveness of new diagnostics, drugs or vaccines.

Description of the association of the three main LSP/ SNP lineages of M. tuberculosis in San Francisco and their respective spoligotype families. These families correspond to sub-lineages within the main LSP/SNP-based lineages. LSP = large sequence polymorphism; ...

Of the 919 isolates, 141 (15%) had a unique spoligotype and 778 (85%) had a spoligotype that was identical to at least one other isolate. Among these 778 isolates, there were 76 different spoligotypes. All the isolates that had the same spoligotype belonged to the same lineage. Thus, we did not find any evidence of convergence of spoligotypes (κ = 1). The present analysis focused only on the main LSP/SNP-based lineages, however, and we did not test for convergence of spoligotypes within LSP/SNP-based sub-lineages. Thus, even though we were unable to detect convergence of spoligotypes among the main lineages of M. tuberculosis, it is still possible that this phenomenon occurs within individual LSP-based sub-lineages.

A further limitation of this study is that the collection of strains analysed is not representative of the global diversity of M. tuberculosis, as we studied only three of the six main lineages.2 It is therefore possible that spoligotypes from any of the other three lineages (EAI, West-Africa I and II) may not be as informative or that their spoligotypes may be subject to convergent evolution.


In San Francisco, spoligotyping could be used to classify M. tuberculosis into the main strain lineages in 95% of cases. Spoligotype families are sub-lineages within the main lineages defined by LSP/SNP; it is therefore important to emphasize that these spoligotype families and the corresponding LSP/SNP-based lineages are not ‘phylogenetically equivalent’, as the respective genetic markers do not define the same branches of the phylogeny. Also, spoligotyping cannot be used to infer phylogenetic relationships among strain families.3 More information is needed to determine the accuracy of the strain classification based on spoligotyping in a collection of strains that includes all six lineages of M. tuberculosis as well as the analysis of the performance of spoligotyping among the different LSP/SNP-based sublineages.


The authors express their appreciation to the staff of the Mycobacteriology Section, San Francisco Department of Public Health Laboratory, and the Microbial Diseases Laboratory, California Department of Public Health, for the spoligotyping data. This study was supported by a grant from the National Institutes of Health (AI 034238). SG was supported by the Medical Research Council, UK, and the National Institutes of Health grant HHSN266200700022C.


1. Comas I, Homolka S, Niemann S, Gagneux S. Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS ONE. 2009;4:e7815. [PMC free article] [PubMed]
2. Gagneux S, DeRiemer K, Van T, et al. Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc Natl Acad Sci USA. 2006;103:2869–2873. [PubMed]
3. Comas I, Gagneux S. The past and future of tuberculosis research. PLoS Pathog. 2009;5:e1000600. [PMC free article] [PubMed]
4. Thwaites G, Caws M, Chau TT, et al. Relationship between Mycobacterium tuberculosis genotype and the clinical phenotype of pulmonary and meningeal tuberculosis. J Clin Microbiol. 2008;46:1363–1368. [PMC free article] [PubMed]
5. Cowan LS, Diem L, Monson T, et al. Evaluation of a two-step approach for large-scale, prospective genotyping of Mycobacterium tuberculosis isolates in the United States. J Clin Microbiol. 2005;43:688–695. [PMC free article] [PubMed]
6. Brudey K, Driscoll JR, Rigouts L, et al. Mycobacterium tuberculosis complex genetic diversity: mining the fourth international spoligotyping database (SpolDB4) for classification, population genetics and epidemiology. BMC Microbiol. 2006;6:23. [PMC free article] [PubMed]
7. Kamerbeek J, Schouls L, Kolk A, et al. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997;35:907–914. [PMC free article] [PubMed]
8. Cowan LS, Diem L, Brake MC, Crawford JT. Transfer of a Mycobacterium tuberculosis genotyping method, spoligotyping, from a reverse line-blot hybridization, membrane-based assay to the Luminex multianalyte profiling system. J Clin Microbiol. 2004;42:474–477. [PMC free article] [PubMed]
9. Flores L, Van T, Narayanan S, Deriemer K, Kato-Maeda M, Gagneux S. Large sequence polymorphisms classify Mycobacterium tuberculosis with ancestral spoligotyping patterns. J Clin Microbiol. 2007;45:3393–3395. [PMC free article] [PubMed]
10. Filliol I, Driscoll JR, Van Soolingen D, et al. Global distribution of Mycobacterium tuberculosis spoligotypes. Emerg Infect Dis. 2002;8:1347–1349. [PMC free article] [PubMed]