Some recent papers have dealt with the construction of spoligotyping databases (20
). Soini et al. (20
) described a study of 1,429 M. tuberculosis
isolates from 1,283 patients as part of an ongoing population-based TB epidemiology study in Houston, Tex. This paper was soon followed by a report of the biogeographical distribution of 3,319 spoligotype patterns and 259 shared types from 47 countries worldwide (22
). The first study essentially focused on isolates from patients residing in a single state in the United States, whereas in the second study, >73% of the isolates described were from Europe and the United States. Despite these limitations, the studies underlined the fact that a significant number of M. tuberculosis
isolates in circulation were essentially confined to specific geographic locations (20
). By including new spoligotyping data from all over the world, SpolDB3.0 has increased the overall representation; nonetheless, a more representative description of the worldwide diversity of tubercle bacilli should be possible through the acquisition of information from Asian and African countries.
The construction of global polymorphism databases constitutes a powerful tool, as it permits a quantitative estimation of the measure of DNA variations at the chromosomal level by the number of genetic structures observed so far. Similarly to what is done in Drosophila melanogaster
population genetics, where inversions have been classified as “common ubiquitous, rare endemic, recurrent endemic and unique endemic” (24
), we attempted to categorize most of the geographic variations in the DR loci observed so far by spoligotyping, so as to have a better knowledge of moving and expanding clones of M. tuberculosis
. For this purpose, we introduced new indices (MC and SI) and qualifiers (C1 and C2) in order to better describe the spatiotemporal status of natural populations of the M. tuberculosis
complex. For the spatial distribution, the populations studied were defined as endemic, localized, or ubiquitous. For the quantitative distribution, the populations were defined as epidemic, common, recurrent, or rare. These definitions synthetically define a spatiotemporal status for each shared type and, together with its genetic structure, may provide a global idea of its evolutionary history.
The results obtained also underline the well-known fact that casual contacts and sporadic cases, although difficult to detect, are responsible for most of the microepidemics and constitute an important means of TB transmission (6
). Our next objective is to better describe the genetic diversity of the M. tuberculosis
complex worldwide, which may be achieved by recruitment of adequate clinical isolates or DNA samples or inclusion of representative spoligotyping data in the database. Construction of new mathematical models that permit an interpretation based on the combination of DNA fingerprinting, epidemiological, and demographical data should further improve our knowledge of evolutionary processes that intervene in the development and spread of infectious diseases.
Regarding the genetic variability of the DR locus, it was recently shown to be a part of a larger family of sequence repeats among prokaryotes (11
). Much remains to be done to precisely define the potential phylogenetic links within various alleles of this locus, as well as to investigate potential links that are found across individual studies targeting local epidemiological issues, particularly since TB does not respect man-made frontiers. Little is also known about the microevolutionary events associated with the DR locus and how they may influence the interpretation of both spoligotyping and IS6110
RFLP data (31
). Indeed, different isolates from the same strain family and isolates from different strain families may rarely converge to give the same spoligotype pattern (31
). Though of limited importance, this bias may be investigated in detail in future by using second-generation spoligotyping based on a set of new spacer oligonucleotides (26
) or by assessment of other genetic markers (18
) in selected strains. The management of such projects will be facilitated by automation of data entry and data mining to further update SpolDB3.0 (1
). The data acquisition, similarity search, and matching process; labeling; and translation from binary to octal format and vice versa are already automated, and future data exchange and internet working of SpolDB3.0 with other databases (such as IS6110
RFLP or mycobacterial interspersed repetitive units) should soon allow new queries to be screened against an updated version.
The facility by which detection of matches between potentially linked strains can be achieved may make SpolDB3.0 a new tool for international studies of TB transmission. Indeed, the detection of a match between two rare profiles in SpolDB3.0 may be a start to gathering complementary genotyping information, such as IS6110
RFLP or polymorphic GC-rich-sequence RFLP in other international databases, to demonstrate clonality of the studied isolates (17
) and to detect unsuspected epidemiological links. In conclusion, SpolDB3.0 constitutes a potential tool for global TB epidemiology and population genetics and M. tuberculosis
complex taxonomy and phylogeny. It underlines major differences in the population structures of tubercle bacilli within the eight subcontinents studied, and by using new indices and qualifiers, it has led to better interpretation methods and the possibility of future comparison with other methods, such as mycobacterial interspersed repetitive units (18
). Nevertheless, further work is still needed to get a more exhaustive global picture of worldwide tubercle bacillus genetic variability. Another major issue will be the ability to link this genetic diversity to virulence and/or fitness factors and ultimately to the genetic predisposition factors of the human or animal hosts.