|Home | About | Journals | Submit | Contact Us | Français|
Despite being a curable disease, tuberculosis (TB) killed more people in 2009 than during any previous year in history. Progress in TB research has been slow, and remains burdened by important gaps in our knowledge of the basic biology of Mycobacterium tuberculosis, the causative agent of TB, and its interaction with the human host. Fortunately, major systems biology initiatives have recently been launched that will help fill some of these gaps. However, to fully comprehend TB, and control this disease globally, current systems biological approaches will not suffice. The influence of host and pathogen diversity, changes in human demography, and socioeconomic and environmental factors will also need to be considered. Such a multidisciplinary approach might be best described as ‘systems epidemiology’ in an effort to overcome the traditional boundaries between basic biology and classical epidemiology.
Tuberculosis (TB) is caused by a group of closely related Gram-positive bacilli, collectively known as the Mycobacterium tuberculosis complex (MTBC). MTBC comprises the typical human pathogens M. tuberculosis and Mycobacterium africanum, as well as variants affecting various animal species. These animal pathogens include Mycobacterium bovis (a pathogen of cattle), Mycobacterium caprae (goats and sheep), Mycobacterium microti (voles), and Mycobacterium pinnipedii (seals and sea lions) . Contrary to many other pathogenic bacteria, MTBC does not have classical virulence factors such as recently acquired pathogenecity islands, nor does it produce any toxin. Yet, MTBC is able to efficiently infect, survive and transmit among hosts. According to estimates by the World Health Organization, one-third of the world’s human population is latently infected with MTBC, and 1.7 million people die of TB each year . The outcome of TB infection and diseases is highly variable, ranging from complete elimination of the bacteria by innate immunity to classical pulmonary disease, disseminated TB, and death. In 90% of the cases, the infection remains latent, while 10% will develop active disease at some point during their lifetime . The intimate cross-talk between the bacteria and the host immune system is one of the main complexities determining these variable outcomes . In order to better control TB globally, new tools are urgently needed, in particular better diagnostics, new antibiotics, and better vaccines .
In most parts of the world, active TB is still being diagnosed by sputum microscopy . However, this technique has a limited sensitivity, and up to 50% of cases are routinely missed. Although bacterial culture is the current gold standard for detecting TB, this technique takes up to 4 weeks, and requires skilled technicians and well-equipped laboratories, all of which are rarely available in developing countries. Fortunately, novel and highly sensitive molecular tools are being developed which show great promise for rapid detection of active TB . One additional difficulty in diagnosing TB is to reliably differentiate between latent and active disease. Contrary to the traditional view considering TB as a simple binary state of active versus latent disease, the manifestation of TB is currently thought to represent a whole spectrum of infection  (Figure 1). Ideally, biomarkers should be available that allow classifying patients according to this spectrum . Such biomarkers would be particularly valuable if they allowed identifying infected individuals most likely to progress to active TB. A recent study suggests this may become possible .
Apart from the difficulties in diagnosing TB, the standard treatment against TB is also complicated, as it involves a six month regimen with multiple antibiotics. Long-term treatments are inherently problematic, as patient non-adherence or drug shortages can lead to the development of drug resistance . Drug-resistant strains of MTBC started to appear shortly after the introduction of streptomycin in 1943 . Today, many regions of the world report increases in multidrug-resistant TB, and some MTBC strains are now resistant to all available drugs . Adding to the problem of drug-resistant TB is that no new anti-TB drug has been licensed since the discovery of ethambutol in the 1960s. Following the onset of the HIV epidemic in the early 1980s, overconfidence regarding old antibiotics combined with long-term neglect in TB research and surveillance led to a re-emergence of the disease in the developed world . While in the developing world TB presumably never had actually declined, HIV had a dramatic impact on TB incidence, particularly in sub-Saharan Africa. Fortunately, the development of new drugs and shorter treatment regimens against TB are back on the agenda .
In addition to new diagnostics and new antimicrobials, a better vaccine against TB is urgently needed. Considering the large pool of latently infected individuals , prevention of TB infection and disease through vaccination might be the only realistic way of controlling global TB in the long run. However, the Bacille Calmette-Guérin (BCG) vaccine is the only currently approved vaccine against TB and it has a questionable efficiency against pulmonary TB in adults, ranging from 0 to 80% . Yet, BCG remains the most widely used vaccine in the world because it protects children against TB meningitis, the most severe form of the disease . BCG was derived in the first quarter of the 20th century from a virulent strain of M. bovis. The reasons for the observed variation in protective efficacy of BCG are unclear, although differences among BCG strains, exposure to environmental mycobacteria, and human genetic diversity have been invoked . Currently, several new vaccine candidates are at various stages of development . Yet, despite significant progress over the past 20 years, a new and broadly effective vaccine against TB will not be available any time soon. A significant hurdle in TB vaccinology is our limited understanding of what constitutes protective immunity (Box 1). This, as well as many other important gaps of knowledge will need to be filled before an effective TB vaccine can become a reality .
There is increasing consensus among the TB research community that systems biology will play an important role in generating new insights relevant to the development of new diagnostics, drugs, and vaccines against TB [5, 19]. The chronic nature of the disease, characterized by a complex dialog between the host immune system and the pathogen, combined with features such as latency, a complex mycobacterial cell wall, and the phenomenon of antimicrobial persistence, all call for more comprehensive approaches to study the biology of TB (Box 1). In this review, we start by briefly reviewing recent advances in applying systems biology to TB research. We then discuss why systems biology should be combined with complementary approaches to understand and control TB globally. Finally, we review recent data on the genetic diversity and evolution of MTBC, and end by proposing a new hypothesis on the evolution of virulence in MTBC, which, if confirmed, could impact the future spread and control of TB in the world..
Two major systems biology initiatives to study TB have recently been funded by the National Institutes of Health/National Institute of Allergy and Infectious Diseases (NIH/NIAID) and the EU, respectively (Box 2). Systems biology relies on various high-throughput technologies combined with computational modeling to integrate complex biological data . In particular, next-generation sequencing is revolutionizing biology at many levels. Thanks to these new technologies, transcriptomes and transcription factors binding sites will be studied to an extent never before possible . Proteomics  and metabolomics  are also gaining momentum in TB research. Yet, significant analytical problems remain, including finding ways to incorporate all these different types of omics data into a form that not only makes biological sense, but also allows designing theoretical models that can be tested in the laboratory . In TB, such models have been used to study the formation of lung granulomas . These complex structures are formed by immune cells and are believed to encapsulate mycobacteria away from the rest of the body.
TB Systems Biology Program (http://www.broadinstitute.org/annotation/tbsysbio): tuberculosis systems biology approach funded by the National Institute of allergy and infectious diseases (NIAID). The NIAID is also funding systems biology initiatives for other pathogens such as influenza, enteropathogens and emerging respiratory viruses .
SysteMTb (http://www.systemtb.org): a systems biology approach founded by the seventh framework program of the European Union. The project involves the activity of 13 partners and collaborates with the TB System Biology Program.
Immune Epitope Database and Analysis Resource (http://www.immuneepitope.org): contains data on B-cell and T-cell epitopes for different hosts and includes epitopes from pathogens such as M. tuberculosis but also information on autoimmune diseases .
The Allele Frequency Net Database (http://www.allelefrequencies.net): a resource that centralizes the information available regarding frequency and geographical distribution of different polymorphic areas of the human genome. The database includes available data on frequency of HLA alleles across many countries .
TB Database (http://www.tbdb.org): an online database, which centralizes updated experimental data relevant to tuberculosis research. It includes genomic databases to compare M. tuberculosis to other mycobacterial genomes, an analysis of mutations between different clinical strains representative of the MTBC lineages, and extended transcriptome data .
Even if systems biology gave as a complete picture of the cellular processes involved in host-pathogen interactions in TB, there are still many factors inherent to infectious diseases, and particularly to TB, that are not addressed by most current systems biological approaches . These factors are nevertheless crucial for understanding, and ultimately controlling TB globally. TB in Europe started to decrease long before the introduction of biomedical interventions (Figure 2a). Improved living conditions, better nutrition and sanitation are believed to be responsible for this decline . The fact that this trend continued well into the 20th century, raises questions as to the actual contribution of BCG vaccination and chemotherapy to the overall reduction of TB in Europe. Hence, in addition to the important questions addressed through systems biology, other factors related to the environmental, social and demographic contexts need to be studied using complementary approaches . Moreover, substantial genetic variation exists both among humans and within MTBC. This diversity needs to be considered to ensure that novel TB diagnostics, drugs and vaccines will be universally effective [29, 30].
Over the past 20 years, more than 100 studies have looked for possible associations between human gene polymorphisms and susceptibility to TB  (Figure 2b). The polymorphisms that were found reflect the influence of human genotype on disease, a factor which is not generally considered in current systems biology of infectious diseases. Similarly, at least 100 studies have explored the role of MTBC strain diversity in TB  (Figure 2c). Many experimental studies found clear evidence for strain effects on immune recognition and virulence, but only a few studies have reported consistent differences in clinical settings. To date, only five studies have explored possible interactions between human genotypes and MTBC genetic diversity [33–37]. More such studies are needed to understand the role of genetic diversity in host-pathogen interactions.
Given the diversity of factors driving TB in the world (Figure 2), a more comprehensive approach is needed, which complements and informs current TB research. We refer to this approach as ‘systems epidemiology’ (Figure 3), and envisage it as the combination of ‘classical’ systems biology, which addresses most of the biological aspects of TB, with epidemiology, sociology, evolutionary biology and ecology, which collectively cover the physical and social environments, as well as the evolution of the host and the pathogen . An improved understanding of the interplay between these many factors is a prerequisite to develop and successfully implement more effective tools and strategies to control TB in the world. For the remainder of this review, we shall focus on work carried out during the past decade, which illustrates how a better grasp of the genetic diversity and evolution of MTBC contributes to our understanding of TB.
Many methods have been used to genotype bacteria. The development of additional genotyping techniques has not always been justified . As a consequence, classifying strain diversity among bacterial species has often been confusing. In MTBC, the first genotyping methods were used to study the epidemiology and transmission dynamics of TB. Therefore, these techniques targeted molecular markers with fast evolutionary rates, such as insertion sequences (IS6110 RFLP), variable number tandem repeats (VNTRs) or CRISPRs (spoligotyping) . However, these methods have limited utility when applied to long-term evolutionary questions . Hence, more slowly-evolving markers have been used to track the main phylogenetic lineages of the MTBC. Because of the low rate of DNA sequence diversity in MTBC compared to other bacteria , standard multilocus sequence typing (MLST), which is based on sequencing of a few housekeeping genes, does not generate sufficient phylogenetic information in MTBC . In addition to being genetically monomorphic, MTBC exhibits a clonal population structure with no evidence of ongoing horizontal gene transfer. Hence, a novel mutation occurring in the ancestor of any given MTBC lineage will be inherited to all members of this lineage and thus serve as a phylogenetic marker for all strains belonging to this lineage. This phenomenon has been taken advantage of, first for genomic deletion analysis [42–45], then for genotyping based on single nucleotide polymorphisms (SNPs) [46–50], and most recently, for comparative whole genome sequencing . Thus, rather than representing mere ‘yet another typing method’ , the application of the latter three genotyping approaches have shed new light onto the evolution and biology of MTBC.
One of the first genotyping methods applied to evolutionary questions in TB was based on genomic deletions. Two studies used genomic deletions as phylogenetic markers to revisit the classification of the different human and animal subspecies within MTBC [44, 45]. One of the important findings of these studies was that human MTBC did not, as previously assumed, originate from animal-adapted M. bovis during the initiation of agriculture and animal domestication. Instead, all human MTBC share a common ancestor with all animal MTBC, which most likely existed long before the Neolithic transition. Another study screened 100 clinical isolates of human MTBC by comparative genome hybridization and found that 68 different genome regions were deleted in one or more of these strains, corresponding to 4.2% of the coding capacity of the MTBC reference strain H37Rv . Some of these genomic deletions grouped strains into discrete lineages . A further screen based on genomic deletions in 875 strains from global sources revealed that human MTBC consists of six main lineages associated with different geographic regions and human populations (Figure 2c) . A molecular epidemiological investigation in San Francisco reported differential transmission of these lineages in different patient populations, depending on whether the infected patient came from a geographic area associated with a particular MTBC lineage or not. Based on these data, the authors hypothesized that the different lineages of MTBC might have co-evolved with different human populations [42, 43].
Taken together, these evolutionary studies support an ancient origin of human TB, and suggest that the phylogeographic distribution of MTBC and possible host-pathogen co-evolution need to be considered when developing new tools to combat TB .
Deletion analysis gave us insight into the phylogeographical population structure of MTBC, and pointed towards an association between the bacteria and different human populations. However, genomic deletions do not correlate with phylogenetic distances and therefore do not indicate how closely related one strain is to any other. In 2008, Hershberg et al.  published the results of DNA sequencing of 89 genes in each of 108 global strains of MTBC. This work resulted in a phylogeny that corroborated the deletion-based strain groupings, but now the genetic distances between strains could be evaluated. Based on the correlation between these genetic distances and the geographic distances between patient origins, the authors proposed a new evolutionary scenario for human MTBC. The so-called ‘out-of-and-back-to-Africa’ scenario postulated that MTBC originated in Africa. After dispersing around Africa, giving rise to two MTBC lineages also known as M. africanum , a first wave of phylogenetically ‘ancient’ MTBC accompanied the out-of-Africa migrations of modern humans and populated the south of India and Southeast Asia. Shortly thereafter or concomitantly, the phylogenetically 'modern' MTBC lineages spread out of Africa into Europe and Asia. These modern lineages then expanded as a result of the massive increases of human populations in Europe, India, and China during the past few centuries. Consistent with this scenario, a study by Wirth et al. based on VNTR analyses of a global collection of MTBC found signals of population expansion, which were more pronounced in MTBC strains originating from Europe or Asia compared to strains from Africa .
This new evolutionary scenario for MTBC is not just an interesting academic exercise, but leads to a new hypothesis with respect the evolution of virulence in MTBC and its possible impact on disease progression. This hypothesis builds on previous findings in evolutionary ecology suggesting that increased access to host or increased host density can select for increased virulence . Based on the evolutionary scenario proposed by Hershberg et al. , 'modern' MTBC lineages have evolved in the context of high human population densities (i.e. easy access to susceptible hosts), whereas 'ancient' lineages evolved in regions where human host densities remained low until very recently. This new hypothesis postulates that 'modern' strains became more virulent as they adapted to large host populations, whiles 'ancient' strains remained less virulent in order to avoid exhausting the limited pool of susceptible hosts. When re-formulating this hypothesis in terms of the spectrum of latent versus active TB discussed above , we expect that a person infected with a 'modern' strain is more likely to experience the condition in which the infecting mycbacteria are able to survive the initial contact with the human immune system and replicate actively, leading to a faster disease progression (Figure 1). However, a person infected with an 'ancient' strain will be associated with the condition in which the infection is better controlled by the host immune system, increasing the likelihood of latent TB. As we shall discuss below, there are emerging data from both the laboratory and the field that are consistent with this idea.
Research in immunology and cell biology has illuminated many aspects of host-pathogen interaction in TB [56, 57]. An increasing body of evidence from cellular and animal infection models demonstrate that MTBC strains differ in their immunogenicity and virulence [32, 58]. However, most studies to date have studied a limited number of MTBC strains. A more comprehensive effort was recently reported by Portevin et al. . In that study, 26 strains representative of the global diversity of human MTBC were selected and used to infect monocyte-derived macrophages and dentritic cells from multiple human donors. The authors found that strains belonging to the ‘modern’ lineages induced lower pro-inflammatory responses in both cell types compared to the ‘ancient’ lineages (Figure 4a). Importantly, these differences were maintained across eight different donors. However, no such differences were observed when infecting unfractionated peripheral blood mononuclear cells. Considering that hypo-inflammatory innate immune responses have been associated with hypervirulence in animal models of TB [60, 61], the findings by Portevin et al.  support a model in which 'modern' lineages might be able to progress more rapidly to disease and transmit to new hosts by avoiding early immune recognition. If confirmed, this would be consistent with the idea that ‘modern’ lineages have evolved under higher host densities, which in turn might have selected for increased virulence and transmissibility.
A study by de Jong et al.  in the Gambia followed a cohort of TB patients and their household contacts over two years. The authors found no difference in the rate of transmission between the 'modern' lineages of MTBC and the 'ancient' lineage known as M. africanum. However, they found that patients infected with 'modern' strains were three times more likely to progress to active disease compared to M. africanum (Figure 4b). Taken together, these studies suggest that 'modern' MTBC strains differ from ‘ancient’ strains in a way consistent with the hypothesis on the evolution of virulence in MTBC presented above. However, more studies are needed to confirm these findings and determine how the current trends of globalisation, urbanisation and general population growth will impact the virulence of MTBC in the future (Box 3).
The main selective pressures acting on a pathogen include the host immune response, exposure to antibiotics, and the host population dynamics. As explained in the main text, the role of the immune system in shaping pathogen diversity is starting to be elucidated and pressure by antibiotics is well known to have resulted in the emergence of drug-resistant microbes. The impact of human population changes, however, has been rarely explored. Human population sizes have been changing from small hunter-gatherer populations to large urban settlements in Europe, Asia and America (Figure Ia). It is therefore expected that given the larger number of susceptible hosts available, more virulent strains will evolve in densely populated areas . Recent reports suggest that this could be the case. It has been shown that strains from MTBC lineages associated with Europe and Asia (i.e. the 'modern' MTBC) are associated with a delayed pro-inflammatory immune response (Figure 4a)  and a shorter progression to active TB (Figure 4b) . Increased pathogen pressure, associated with higher human densities, has been shown to select for higher frequencies of resistance alleles against infection. This is the case of a human genetic variant in the gene SLC11A1 known to provide resistance against intracellular pathogens. The frequency of this allele in human populations correlates with the time of first urbanization in different geographic areas (Figure Ib) . The selection for human resistant variants associated with regions of increased human settlement suggests that the selective pressure of TB on humans might be higher in urban than rural areas, perhaps because of the emergence of more virulent strains. Likewise, the continued increase of human populations all over the world, and especially in Africa, suggests that more virulent strains of MTBC could emerge in the future .
One of the features of host-pathogen co-evolution is the ongoing evolutionary arms-race between the pathogen and the host immune system. T-cell based immunity is essential for the control of human TB . This is illustrated by the fact that HIV-infected individuals with low CD4+ T-cell counts are at high risk of developing active TB. Many pathogens avoid immune recognition through the accumulation of genetic diversity in antigens; a phenomenon known as immune evasion . Two early studies suggested that no immune evasion was occurring in MTBC [50, 64]. However, these studies were limited with respect to the number of antigens studied. By May 2010, the Immune Epitope Database and analysis resource (Box 2) comprised a total of 491 experimentally confirmed human T-cell epitopes of MTBC, corresponding to 78 antigens in the MTBC genome. This offered an opportunity to revisit the question of immune evasion in MTBC based on a larger dataset. Comas et al.  generated the nearly complete genome sequences of 21 clinical strains representative of the global diversity of human MTBC. They then inferred a new phylogeny based on the concatenation of all 9,037 phylogenetic informative SNPs identified among these strains. This new genome-based phylogeny of MTBC was congruent with the phylogenies published earlier, but had a higher resolution. For example, the basal position of the two M. africanum lineages corroborated the likely African origin of MTBC. In addition, this new phylogeny allowed reconstruction of the ancestral state of all the SNP positions identified in these 21 genomes, and the direction the SNP change was determined at each position. The authors then extracted all the known T-cell antigens and classified the remaining of the genome into essential and non-essential genes based on previous experimental work . As expected, essential genes were more evolutionarily conserved than non-essential genes. Surprisingly however, the known T-cell antigens turned out to be equally conserved as essential genes. Because antigens consist of epitopes which are immunologically recognised, and non-epitope regions which are not, the authors studied these regions separately. The results showed that 95% of epitopes in MTBC had no amino acid change at all. Furthermore, epitopes were more evolutionarily conserved than non-epitope regions, and overall were the most conserved regions in the entire genome.
Based on these findings, the authors hypothesized that this hyperconservation of T-cell epitopes in MTBC might reflect the fact that the host immune responses to these epitopes are beneficial to the bacteria rather than to the host. In other words, MTBC does not seem to rely on immune evasion, but rather on some type of immune subversion. Some characteristics of the life history of TB support this view. For example, cavitary TB, the most contagious form of the disease , is primarily driven by immune-pathological processes which promote lung damage and thereby increase the transmissibility of TB [67, 68].
The hyperconservation of human T-cell epitopes in MTBC has important implications for the design of new TB diagnostics and vaccines. On the positive side, new diagnostics based on these epitopes will be universally applicable. By contrast, developing vaccines candidates targeting these epitopes could become problematic if the human immune responses elicited offer a net benefit to MTBC.
In addition to generating new insights into the genetic diversity and evolution of MTBC, next-generation genome sequencing will increasingly be used to address more pragmatic public health-driven questions (see  and accompanying papers). The most obvious application in this respect is in the emerging field of genome epidemiology [70, 71]. In MTBC, next-generation sequencing has been used to highlight the limitations of current genotyping techniques for differentiating between closely related strains . In a study from The Netherlands, a cluster of TB transmission comprising 104 patients was investigated by genome sequencing of three of the corresponding patient isolates. Eight SNPs were identified among these three isolates and used to genotype the remaining patient isolates of the cluster. This approach allowed identifying the index cases and clarifying the routes of transmission . More recently, the genome sequences of 32 patient isolates of a transmission cluster in British Columbia was reported . The SNP data was integrated with information about the social contacts of the patients. The results highlighted the role of ‘superspreaders’ in the transmission of TB. There is little doubt that next-generation sequencing will increasingly be used for routine molecular epidemiology of TB in the future .
In conclusion, systems biology approaches are crucial to address some of the most urgent questions in TB research (Box 1). Answering these questions will be essential to drive the development of new drugs and vaccines against TB. However, many more questions need to be addressed, and we think that an even more comprehensive approach will be necessary to understand TB in all its complexity. We have been referring to this multidisciplinary concept as systems epidemiology (Figure 3). Some of the new high-throughput technologies routinely used in systems biology are increasingly being introduced into other disciplines. In particular, high-throughput DNA sequencing, which continues to provide novel insights into biology, has the potential to revolutionize molecular and genetic epidemiology of infectious diseases, as the genome sequences of both the infecting agent and the patient will soon be determined routinely. The rationale for studying the genomic diversity of both MTBC and its human host is given, as mounting evidence supports the relevance of this diversity for understanding TB [31, 32]. In particular, the possible consequences of host-pathogen co-evolution in TB have only recently been addressed [51, 59, 62]. Some of the resulting observations seem contradictory at first glance. However, they can be reconciled in a model embedded in ecological theory on the evolution of virulence (Box 3). Following this model, the hyperconservation of T-cell epitopes in MTBC  suggest that all MTBC strains depend on some aspects of the host adaptive immune response for successful transmission. By contrast, the variation in innate immune responses reported by Portevin et al.  reflect subtle differences between strains directly or indirectly linked to different rates of progression to active disease , which perhaps have been selected as a consequence of changes in human host densities . More work is needed to confirm this model and explore its implications for the global control of TB.
We thank Andrés Moya and Douglas Young as well as other members of our group for valuable comments on the manuscript. The work in our laboratory is supported by the Swiss National Science Foundation (grant no. PP00A-119205), the Medical Research Council, UK (MRC_U117588500), the Leverhulme-Royal Society Africa Award (AA080019), and the National Institutes of Health (AI090928 and HHSN266200700022C).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.