|Home | About | Journals | Submit | Contact Us | Français|
Mycobacterium tuberculosis (Mtb) has generated a global health catastrophe that has been compounded by the emergence of drug resistant Mtb strains. We used whole genome sequencing to compare the accumulation of mutations in Mtb isolated from cynomolgus macaques with active, latent and reactivated disease. Based on the distribution of SNPs observed, we calculated the mutation rates for these disease states. Our data suggest that Mtb acquires a similar number of chromosomal mutations during latency as occurs during active disease or in a logarithmically growing culture over the same period of time despite reduced bacterial replication during latent infection. The pattern of polymorphisms suggests that the mutational burden in vivo is due to oxidative DNA damage. Thus, we demonstrate that Mtb continues to acquire mutations during latency and provide a novel explanation for the observation that isoniazid monotherapy for latent tuberculosis is a risk factor for the emergence of INH resistance1,2.
In active tuberculosis, patients harbor large numbers of replicating organisms and are treated with multiple antibiotics to prevent the emergence of novel drug resistance mutations. In contrast, Mtb from latent infection is thought to have little capacity for mutation and is typically treated with a single antibiotic, isoniazid (INH). Recent epidemiologic studies have found that INH preventive monotherapy (IPT) for latent tuberculosis is associated with INH resistance1,2. In Mtb, all drug resistances are the result of chromosomal mutations and depend on the bacterium's capacity for mutation during the course of infection. Therefore, we seek to define the mutational capacity of the bacterium during infection to better predict the rate at which drug resistance can be expected to emerge in active, latent, and reactivated disease.
Conventional approaches to measuring bacterial mutation rates cannot be applied to Mtb in vivo. However, high-density whole genome sequencing (WGS) allows us to assess the capacity of Mtb for mutation over the course of infection with minimal bias and maximum sensitivity3–5. As the nonhuman primate is the only animal model that mimics the broad range of disease seen in human tuberculosis6,7, we performed WGS on the infecting strain of Mtb, Erdman, and 33 isolates from nine cynomolgus macaques that represented the three major clinical outcomes of infection (active disease, persistently latent infection and spontaneously reactivated disease after prolonged latency7) (Fig. 1). Genome coverage averaged 93% across these isolates, and average read depth was 117× across the genomes (Supplementary Table 1). Putative polymorphisms were identified using both a scaffolded approach8,9 and a de novo assembly method10, and polymorphic sites were validated by Sanger sequencing or through independent identification by WGS. Through these analyses, we identified 14 unique single nucleotide polymorphisms (SNPs) (Fig. 2). There was no evidence that these SNPs were present in the inoculum either from repeated deep sequencing and PCR resequencing of the inoculum, or from shared polymorphisms between bacteria from different lesions. While we have used WGS previously to detect insertions and deletions9, we did not detect either in the 33 genomes analyzed. Within lesions, we identified both shared and independent polymorphisms as would be expected if the SNPs accrued within lesions over the course of the infection.
We next sought to quantify the average mutation rate of the bacterium in the different stages of clinical disease. The mutation rate (μ) of a bacterium in vivo can be estimated from the number of mutations (m) that have occurred for a genome of known size (N) over a known number of generations (t/g where t is length of infection and g is generation time). However, the generation time of Mtb in humans or non-human primates is unknown. In vitro, Mtb has a generation time of approximately 20 hours under nutrient rich conditions11. In mice, the bacterial organ burden increases at roughly this rate during the first weeks of infection, but the subsequent onset of the adaptive immune response causes bacterial replication to slow substantially or cease entirely12,13. In clinical latency in nonhuman primates and humans, the immune response limits infection to the point that there are no clinical or radiographic signs of overt disease. This is thought to be associated with a dramatic slowing or cessation of bacterial replication, although we have recently shown that lesions from clinically latent cynomolgus macaques display a range of histopathologies and bacterial burdens, suggesting a spectrum of bacterial physiologies may occur in latency14, 15.
Because of the inherent uncertainty in the generation time of Mtb in vivo, we estimated the mutation rate across a broad range of generation times (18–240 hours), calculating the rate that would be required to generate the number of polymorphisms identified by WGS (Fig. 3a–c). In order to compare the mutation rate of bacteria from each clinical condition, we derived a lower limit for the bacterial mutation rate in vivo, which we define as the predicted mutation rate per generation if the in vivo generation time were equivalent to the in vitro generation time of 20 hours, μ(20hr). While Mtb is likely to have a much longer generation time in vivo, especially during prolonged latent infection, we use μ(20hr) as a highly conservative boundary estimate of the in vivo mutation rate that allows us to directly compare the mutational capacity of the bacterium in different in vivo conditions. Strikingly, we found that the bacterial population's capacity for mutation, μ(20hr), during latency (2.71×10−10) and reactivated disease (3.03×10−10) is equivalent to that of Mtb from animals with active disease (2.01 ×10−10) (Table 1). Mutation rate can also be calculated as the number of mutations that occur per day of infection rather than per generation. We therefore calculated the mutation rate per day required for the bacterial populations in each disease state to acquire the number of polymorphisms that we identified by WGS (Fig. 3d). Our data indicate that in macaques with active, latent and reactivated disease, the bacterial populations acquire mutations at the same rate over time, regardless of the number of bacterial replications that have occurred.
We then sought to benchmark these rates against the mutation rate of the bacterium in vitro. Luria and Delbrück fluctuation analysis measures the rate at which coding polymorphisms conferring a selectable phenotype arise under stable in vitro conditions16. While standard and widely applied, this approach is limited in that it only measures the rate of a small set of coding mutations within a single region of the genome and is therefore not as sensitive as WGS. However, the fluctuation analysis derived mutation rate can be converted to a per base mutation rate by defining the number of mutations conferring resistance17 and then compared to the mutation rate determined with WGS. Using fluctuation analysis and scoring for the acquisition of rifampicin resistance, we found that the rate of resistance is 2.1×10−09 (Supplementary Fig. 1a,b), consistent with or slightly higher than previously published values for Mtb18,19. We then used Sanger sequencing to define the number of coding mutations conferring rifampicin resistance under our growth conditions and found it occurred through ten unique polymorphisms, consistent with previous reports20 (Supplementary Fig. 1c). Dividing the phenotypic rate by the target size, we determined that the in vitro mutation rate of the inoculating strain (Erdman) is μin vitro=2.1×10−10. Thus μ(20hr), our conservative estimate of the in vivo mutation rate from every disease state, is highly similar to the bacterium's mutation rate observed during in vitro growth.
Why does the bacterial population in macaques with clinically latent infection acquire mutations at a similar rate to rapidly replicating bacteria in vitro? One possibility is that Mtb could be actively dividing during the entire course of prolonged clinical latency, perhaps balanced by robust killing. However, though the generation time of Mtb in animals or humans with latent infection is unknown, several lines of evidence suggest that Mtb replication slows during clinical latency12–15. If the generation time slows, the mutation rate would have to be proportionally higher to generate the number of mutations observed. For example, if Mtb from animals with latent infection replicate on average every 135 hours, as in mice after ten weeks of chronic infection12, the bacterial population must have an average mutation rate per generation of 1.80×10−09, nearly an order of magnitude greater than the in vitro mutation rate (Table 1).
An alternative interpretation is that the mutational capacity of Mtb during latent infection is determined primarily by the length of time the organism spends in the host environment rather than the replicative capacity and replicative error of the organism during infection. We noted that eight of the ten polymorphisms that we identified in our isolates from animals with persistent latent or reactivated latent infection are possible products of oxidative damage, either cytosine deamination (GC>AT) or the formation of 8-oxoguanine (GC>TA) (Fig. 4a). This is consistent with the model that Mtb faces an oxidative environment in the macrophage phagolysosome21,22 and data indicating that genes involved in the repair of oxidative damage are essential for bacterial survival during murine infection23. In addition, we found that the pattern of polymorphisms in Mtb from cynomolgus macaques is similar to the pattern of neutral polymorphisms that emerged during the evolution of extensively drug resistant Mtb strains in patients from South Africa (Fig. 4a)9. Thus, the mutational capacity of Mtb during latent infection as well as the spectrum of those mutations suggests that the dominant source of mutation during latency is oxidative DNA damage rather than replicative error24 (Fig. 4b). This may occur because the immune response that results in latent infection causes more oxidative damage to the bacterial DNA14,25 or because a portion of the bacteria may enter a metabolically quiescent state in which DNA repair is diminished26,27.
Thus, using WGS, we demonstrate that Mtb has greater mutational capacity in latency and early reactivation disease than predicted by in vitro measurements of mutation rate and estimates of in vivo generation time. These data indicate that Mtb retains the ability to acquire drug resistance mutations during latency. The rate at which clinical drug resistance will emerge after IPT treatment of latently infected individuals harboring an initially drug-sensitive population also depends upon the number of bacteria in a latently infected individual and the rate of reactivation, which is low in immunocompetent individuals. Indeed, there is only a modest increased risk of INH resistance after IPT in immunocompetent populations1–2, some of which may be attributable to selective killing of susceptible bacterial populations, leaving only resistant populations to reactivate28. Our results suggest that in addition to these mechanisms, part of the increased risk of INH resistance after IPT may be due to selection of monoresistant mutants that arise during latency. IPT is now being recommended globally for HIV+ individuals with clinically latent tuberculosis where bacterial burden and the rate of treatment failure may be higher because of immunocompromise29. If our data from the macaque model are predictive of the mutational capacity of Mtb in HIV+ individuals, INH monoresistance could arise at a substantial rate. These findings emphasize the importance of drug resistance testing and careful monitoring for treatment failure in these patient populations.
Animals were infected as described previously7 via bronchoscopy with a small number (~25) of organisms. Like humans, macaques developed either active disease or controlled latent infection. In latency, animals became clinically asymptomatic, without microbiologic or radiographic evidence of disease. Clinically latent animals were followed as described previously7 for prolonged periods of time in the absence of treatment. Spontaneous reactivation of latent infection occurred in a small number of animals. Animals were euthanized at the indicated times after infection and lesions identified on necropsy were plated for bacterial colonies (Fig. 2)7. Colonies from necropsy were subsequently streaked onto LJ slants and expanded for extraction of genomic DNA. Minimal expansion occurred between isolation of strains and extraction of genomic DNA.
Two μg of genomic DNA from each isolate were used for sequencing with the Illumina Genome Analyzer (Illumina). Single-read & paired-end read sequencing was performed with read lengths of 36 bases or 75 bases and a target coverage of at least 3 million high quality bases. Libraries were prepared using standard sample preparation techniques recommended by the manufacturer. Libraries were quantified using a Sybr qPCR protocol with specific probes for the ends of the adapters. The qPCR assay measures the quantity of fragments properly adapter-ligated that are appropriate for sequencing. Based on the qPCR quantification, libraries were normalized to 2nM and then denatured using 0.1 N NaOH. Cluster amplification of denatured templates occurred according to manufacturer's protocol using V2 Chemistry and V2 Flowcells (1.4mm channel width). Sybr Green dye was added to all flowcell lanes to provide a quality control checkpoint after cluster amplification to ensure optimal cluster densities on the flowcells. Flowcells were sequenced on Genome Analyzer II's, using V3 Sequencing-by-Synthesis kits and analyzed with the Illumina v1.3.4 pipeline. Standard quality control metrics including error rates, % passing filter reads, and total Gb produced were used to characterize process performance prior to downstream analysis. Paired-end reads of 51 bases were acquired and analyzed as described previously9. Short sequence read data is available on the NCBI SRA (accession numbers in Supplementary Table 1) and on an independent site hosted by the Broad Institute and linked through the TB Database (see URLs).
Two read lengths were generated (Supplementary Table 1). Prior to mapping or assembly, the 2×75 bp reads were trimmed to 48 bases and filtered. Any read containing an unknown base was discarded. Reads with homopolymeric runs of A/Ts greater than nine bases or G/Cs greater than ten bases were discarded. Reads with an average quality score of less than 20 were removed. On average greater than 8,000,000 reads were retained after filtering. The 36 bp reads were not filtered. For de novo assembly, reads were processed with Edena v2.1.110 in overlapping mode with the default parameters to allow for the detection of insertions and deletions as well as SNPs. In assembly mode, “strict” was enforced and independent assemblies were generated with length overlaps ranging from position 23 to 37 bases. Assemblies generating the largest N50 values were selected for polymorphism discovery. Each assembly was analyzed by pair wise comparison using the MUMmer script, dnadiff31. Polymorphisms were further processed from the `SNP' files with Perl scripts and mapped to the reference genome H37Rv (Genbank Accession AL123456). For scaffolded assembly, Illumina reads were mapped to the reference genome Haarlem with MAQ v0.7.18. Illumina fastq files for each pair were converted with sol2sanger, individually mapped to the reference and merged together for each isolate. Three mismatches in the alignment seed were allowed during mapping. A minimum read depth of ten was required to call SNPs and remaining parameters were defaults from the script easyrun. For 51bp paired end reads, a minimum read depth of five was required to call SNPs. For 36bp reads, reads were aligned to the reference using easyrun defaults except that we allowed up to three mismatches in the seed. For the Erdman inoculum, four runs were merged to generate the assembly. As this represented the first WGS of the Erdman strain of Mtb, multiple Mtb finished genomes were used as references in preliminary alignments. The Haarlem sequence resulted in the fewest number of SNPs and was selected as the reference sequence for the remainder of the alignments (see URLs). Only sites of difference between the experimental isolates were pursued for further analysis. A master list of sites was created and calls for each site from all samples were extracted with the MAQ command “pileup”, combined into a table and inspected manually. All polymorphic loci were validated either by Sanger sequencing using the indicated primers (Supplementary Table 2) or by independent identification by WGS.
Mutation rate was estimated from the number of SNPs observed in each clinical condition. Our equations assume that both mutation rate and growth rate are parameters that, while potentially dynamic, can be averaged across the lifetime of the bacterium. Additionally, we assume that the number of mutations (m) is an accurate assessment of mutation rate during the life of the cell. SNPs observed multiple times within the same lesion were assumed to have arisen once and then replicated; as such they were each only counted once. Equation (1) describes the estimation of the mutation rate of a single strain as described in Table 1.
Mutation rate (μ) is determined by dividing the number of SNPs (m) by the genome size (N) times the number of generations (t/g). m is defined by the number of SNPs observed, N is determined based on 91% coverage of a 4.4Mb genome (N=4 × 106), t is the total duration of each infection in hours, and g is the generation time in hours. The application of this equation to a clinical condition is described by Equation (2). Samples were binned according to clinical condition and a representative mutation rate was estimated for each condition. Binning allows us to conservatively assess the distribution of mutations in each condition.
In Equation (2), the sum of the SNPs observed (mi) in a condition is divided by the genome size (N) multiplied by the sum of the number of replications possible (ti / g). The number of replications possible is calculated by dividing the total length of infection (in hours, ti) for strain i by the generation time in hours. The generation time, g, was varied from 18 hours to 240 hours to capture the maximum range of biologically plausible generation times. All calculations were performed in Matlab (Mathworks, Natick MA, USA). Estimates of mutation rate and 95% confidence intervals were determined using the poissfit function. Additional probability values were generated for each value of g using the binopdf function. The binopdf parameters and values matched exactly those produced by poissfit, reflective of the ability of the Poisson distribution to approximate the binomial distribution when Npoisson is large and ppoisson is small. Thus, binopdf was used to calculate the probability density function for the observed number of mutations given a mutation rate and a fixed value for g, while poissfit was used to calculate the estimates of μin vivo and the 95% confidence intervals.
To determine the in vitro mutation rate, we performed fluctuation analysis as previously described18. Briefly, 20 independent cultures of 1.08 × 109 cells each in 4mL of 7H9 supplemented with OADC, 0.05% Tween-80 and 0.5% glycerol were plated onto 7H10 plates supplemented with OADC, 0.05% Tween-80, 0.5% glycerol and 2μg/mL of rifampicin. The number of mutations per culture (mMSS) was calculated from the distribution of mutants using the MSS method16 calculated by the Matlab scripts described by Lang et al17. Phenotypic mutation rate was estimated by dividing mMSS by the number of cells plated (Nt = 1.08 × 109). The number of rpoB mutations conferring rifampicin resistance in our assay was determined by amplifying the resistance region of rpoB from 96 independent isolates from fluctuation analysis. A single base mutation rate (μin vitro) was calculated by dividing the rifampicin resistance rate by the number of mutations conferring rifampicin resistance.
We identified the synonymous mutations that distinguished the sequenced drug susceptible, MDR and XDR strains of Mtb from KwaZulu Natal, South Africa from each other using previously published data9. Polymorphisms were called in reference to the sequenced F11 strain of Mtb and polymorphisms in repetitive and IS elements, PE, PPE and PGRS genes and pks12 were excluded from this analysis because these genes contain large, near perfect repeats that create a high likelihood of sequencing and assembly error.
This work was supported by a New Innovator's Award, DP2 0D001378, from the Director's Office of the National Institute of Health to SMF, by a subcontract from NIAID U19 AI076217 to SMF, by NIH RO1 HL075845 to JLF, and by the Bill and Melinda Gates Foundation (JLF). The genome sequencing has been funded in part with federal funds from the National Institute of Allergy and Infectious Disease, US National Institutes of Health (NIH), US Department of Health and Human Services, under contract no. HHSN266200400001C. The project described was supported in part by Award Number U54GM088558 to ML from the National Institute Of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of General Medical Sciences or the National Institutes of Health. We thank D. Gurgil and J. Xu of the Enterprise Research IS group at Partners Healthcare for their support and for provision of the HPC facilities, and E. Klein for necropsy and pathology of the infected monkeys, as well as the veterinary technical staff for care of the animals. We also thank E. Rubin, C. Sassetti, B. Bloom, T. Rosebrock, and B. Aldridge for helpful feedback.
URLs: Mycobacterium tuberculosis Sequencing Project, Broad Institute of Harvard and MIT (http://www.broadinstitute.org/)
Genomic Data for the 34 M. tuberculosis Erdman strains sequenced: (http://www.broadinstitute.org/annotation/genome/mtb_monkey_reseq.1)
Tuberculosis Database (http://www.tbdb.org)
Competing Financial Interests. The authors declare no competing financial interests.