|Home | About | Journals | Submit | Contact Us | Français|
It was reported previously that the major fraction of the recent decrease of tuberculosis incident cases in Arkansas had been due to a decrease in the reactivated infections. Preventing transmission of Mycobacterium tuberculosis is the key to a continued decline in tuberculosis cases. In this study, we integrated epidemiological data analysis and comparative genomics to identify host and microbial factors important to tuberculosis transmission. A significantly higher proportion of cases in large clusters (containing >10 cases) were non-Hispanic black, homeless, less than 65 years old, male sex, smear-positive sputum, excessive use of alcohol, and HIV sero-positive, compared to cases in small clusters (containing 2–5 cases) diagnosed within one year. However, being non-Hispanic black and homeless within the past year were the only two host characteristics that were identified as independent risk factors for being in large clusters. This finding suggests that social behavioral factors have a more important role in transmission of tuberculosis than does the infectiousness of the source. Comparing the genomic content of one of the large cluster strains to that of a non-clustered strain from the same community identified 25 genes that differed between the two strains, potentially contributing to the observed differences in transmission.
Following a peak in tuberculosis cases in the United States in 1992, the incidence of tuberculosis has decreased approximately by half, with 11,540cases reported in 2009 . Although the tuberculosis incidence rate in 2009 was reported by the CDC to be the lowest recorded since 1953 when the national tuberculosis reporting began, the annual percentage decline in tuberculosis incidence rate has slowed from 7.3% per year during 1993–2000 to 3.8% during 2000–2007 [1–5]. Furthermore, tuberculosis incidence rates vary greatly among the states. Arkansas, which once had incidence rates above the national average, has had a 60% decrease in tuberculosis cases from 1992 to 2006, and the incidence rate for the state has now been below the national average for the past seven years [1–6]. A study by France et al. used DNA genotyping of Mycobacterium tuberculosis isolates from Arkansas to distinguish between cases of tuberculosis resulting from recent transmission (clusters of isolates having similar DNA genotyping patterns) and cases resulting from reactivation of a latent infection acquired in the past. By examining the changes in the incidence of recently transmitted disease and reactivation disease, it was observed that the overall decline in tuberculosis cases in Arkansas resulted primarily from declining rates of reactivation disease, and less so from declining rates from recent transmission .
Previous studies have identified risk factors for clustering including male gender, non-Hispanic black race, younger age, homelessness, alcohol use, or intravenous drug use within the past 12 months, HIV-positivity, cavitary disease, and sputum smear positivity [8–11]. More recent studies have focused on examining host factors that are predictive of large clusters, which represent either more extensive transmission of the strain or a higher proportion of primary tuberculosis among those infected with the strain. A study by Kik et al. compared the host characteristics of the first two cases in a cluster between small (2–4 cases) and large (≥ 5 cases) clusters and found that a short interval (< 3 months) between the diagnosis of the first two patients, age < 35 years, urban residence, and sub-Saharan African nationality were independent predictors of large clusters .
Preventing transmission of M. tuberculosis is the key to a continued decline in tuberculosis cases. A better understanding of the host, environmental, and bacterial factors that are associated with clustering will inform strategies to prevent M. tuberculosis transmission. This study investigated host risk factors for belonging to a cluster of tuberculosis cases as well as belonging to a large cluster, and also examined bacterial factors involved in clustering by comparing the genetic content of a large cluster strain to that of a non-clustered strain from the same community.
The study population included 993 tuberculosis cases having M. tuberculosis isolates collected between January 1, 1996 and December 31, 2003, representing 70.8% of all incident cases of tuberculosis and 96.9 % of the bacteriologically-confirmed cases during that time period. Patient information for each case was collected using the CDC Report of Verified Case of Tuberculosis (RVCT) form. The data collected included patient demographics (sex, race/ethnicity, age, city and county of residence, and homelessness or excessive alcohol use within the past year), clinical characteristics (site of disease, chest radiograph results, sputum culture and smear results, and HIV co-infection), and treatment regimen. The study protocols and procedures for the protection of human subjects were approved by the Health Sciences Institutional Review Boards of the University of Michigan and the University of Arkansas for Medical Sciences.
These cases had been previously grouped into clustered or non-clustered based on IS6110 fingerprinting and spoligotyping of the isolates . For isolates having six or more IS6110 copies, isolates with identical IS6110 fingerprints or isolates with IS6110 fingerprints differing by one band and having identical spoligotype patterns were designated as clustered . For isolates with less than six IS6110 copies, isolates having identical IS6110 fingerprints and spoligotypes were designated as clustered. To increase the specificity of genotypic clustering as a measure for recent transmission, cases having matching genotypes by the above criteria also had to have been diagnosed within one year of each other to be considered clustered . By using these criteria, 392 cases shared identical or highly similar genotype patterns with another isolate collected from the Arkansas population during the same time period.
To identify host risk factors for clustering of tuberculosis cases in the Arkansas population, the distribution of previously identified host risk factors for TB transmission [8–11] was first compared between the 392 clustered cases and the 601 non-clustered cases by chi-square or Fisher’s exact test, as appropriate. The host risk factors analyzed were non-Hispanic black race, male sex, age less than 65 years, homelessness, alcohol use, or intravenous drug use within the past 12 months, HIV-positive, cavitary disease, and sputum smear positivity. To identify host risk factors for large clusters of tuberculosis cases, the clustered cases were divided into three groups (Figure 1): large cluster cases (containing >10 cases), medium cluster cases (containing 6–10 cases), and small cluster cases (2–5 cases). The distribution of host demographic characteristics was then compared between the small and the medium cluster groups, and between the small and the large cluster groups, respectively by chi-square or Fisher’s exact test, as appropriate. In order to identify the host risk factors for clusters of various size ranges, respectively, controlling for potential confounders, two multivariate logistical regression models were fit, using the small cluster group as a control group. Variables included in these two models are essential demographic variables, such as age, sex, and race/ethnicity, and all the other variables that had shown disproportional distributions in the Chi-square analysis. All the statistical analysis were done using SAS version 9.2. (SAS Institute, Cary, NC).
Two M. tuberculosis isolates were selected for genomic comparison to identify large genomic deletions that may account for the observed differences in transmissibility between strains of M. tuberculosis. One isolate (SA201) was selected to represent the strain responsible for the large, persisting cluster of tuberculosis cases. Another isolate (SA178) that caused disease in only one person in the same setting and time period was selected for the comparison.
The criteria for selection of the comparison strain included both clinical and demographic information to assure that the unique strain’s opportunity for transmission was comparable to that of the clustered strain. First, isolates with unique IS6110 fingerprinting patterns from the counties that also had cases caused by the SA201 strain were selected as potential comparison isolates. The clinical information for the unique isolates was reviewed and two strains that were isolated from patients with pulmonary cavitary tuberculosis and were sputum smear positive were identified. From these two strains, the one that was isolated from the patient who was younger at the time of diagnosis was selected for the comparison. The patient infected with M. tuberculosis strain SA178 was 66 years old at the time of her diagnosis in 1996 and resided in the same county for her entire life. Contact investigation of this patient identified 17 contacts, all of whom were tuberculin skin tested negative, all showing 0 mm induration except for one.
A microarray-based genomic characterization was performed to identify large sequence polymorphisms (LSPs) in the genome of the isolate (SA201) causing the persistent cluster and the non-clustered isolate (SA178). Single channel DNA microarray hybridizations using the TIGR M. tuberculosis microarray were performed in duplicate. The microarray contains 4,750 70-mer oligonucleotides, printed twice on each slide, representing 4,127 open reading frames (ORFs) from H37Rv and 623 unique ORFs from CDC1551. Four μg of genomic DNA from each strain was digested with the restriction enzyme RsaI. Two μg of the purified digested DNA was labeled with Fluorescein-12-dCTP (PerkinElmer, Boston, MA) using the BioPrime Labeling Kit according to the manufacturer’s instructions (Invitrogen, Carlsbad, CA). A hybridization mixture of 48 μl of 1.25X HybIt hybridization buffer (ArrayIt, TeleChem International, Sunnyvale, CA), 0.6 μl of 10 mg/ml salmon sperm DNA (Invitrogen, Carlsbad, CA), and concentrated probe was prepared. The hybridization mixture was denatured at 94°C for 4 minutes before being applied to the microarray slide. The slide was then transferred to a sealed hybridization chamber and submerged in a 68°C water bath for at least 16 hours. Following hybridization, the slide was washed in 50°C low stringency wash buffer (1X SSC, 0.2% SDS) for 8 minutes. The slide was then washed in high stringency wash buffer (0.1X SSC, 0.2% SDS) for 8 minutes followed by two washes in 0.1X SSC for three minutes. Detection of hybridized fluoresce-labeled DNA probe was performed using the MICROMAX TSA Labeling and Detection Kit (PerkinElmer, Boston, MA) according to the manufacturer’s instructions. The hybridized microarray slides were scanned and Cyanine 3 intensities were analyzed using the Virtek Chip Reader (Waterloo, Ontario, Canada). Spot signal intensities that were greater than three times background intensity were counted as positive for hybridization. Each ORF was tested for hybridization at four spots for each strain. ORFs that were negative for hybridization at two, three, or four of the four spots were considered as possibly having large-sequence polymorphisms (LSPs).
To confirm the potential LSPs and determine their location and size, PCR amplification of the ORFs identified as having a potential LSP was performed, followed by automated DNA sequencing of the amplification product. Sequencing was performed in Applied Biosystems DNA Sequencers (Models 3700 and 3730). The gene sequences were compared to that of the appropriate sequenced M. tuberculosis strain (H37Rv or CDC1551) using the BLAST program of the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/BLAST).
Several demographic characteristics of the study subjects were found to be differentially distributed between all cases clustered within 1 year and non-clustered cases (Table 1). Host factors that were significantly over-represented among the clustered cases were age less than 65 years, non-Hispanic black race/ethnicity, homelessness, excessive alcohol use, being HIV-positive, and sputum smear-positive. The distribution of the host factors male sex, intravenous drug use, and lung cavitation were not significantly different between the two groups. After controlling for the potential confounders by multivariate logistical regression analysis, being non-Hispanic black race/ethnicity [OR=2.07, 95% CI (1.52, 2.82), P<0.0001] and younger than 65 years [OR=2.30, 95% CI (1.68, 3.14), P<0.0001], remained statistically significantly associated with clustering (Table 2).
To examine host factors for being involved in large clusters, which could be the result of increased transmission and/or a higher likelihood of recent development of TB disease, the 392 clustered cases were divided into three groups based on cluster size, as described above. One hundred and four isolates contained in 5 clusters were grouped into the large cluster group, 68 isolates contained in 9 clusters were grouped into the medium cluster group, and the remaining 220 clustered isolates representing 91 clusters were grouped into the small cluster group. Host factors that were significantly over-represented among the large cluster cases, as compared with small cluster cases, were age less than 65 years, male sex, non-Hispanic black race/ethnicity, homelessness, and excessive alcohol use, HIVsero-positive, and sputum smear-positive (Table 3). The distribution of the host factors cavitary disease, and intravenous drug use were not significantly different between large cluster cases and small cluster cases. After controlling for the potential confounders by multivariate logistical regression analysis, being non-Hispanic black race/ethnicity (OR=1.89, 95% CI [1.01, 3.55], P=0.047) and homeless (OR=10.35, 95% CI [2.02, 52.93], P=0.005) remained statistically significantly associated with being in large clusters (Table 4). HIV was the only host factor that was significantly overrepresented among the medium cluster cases, in comparison with small cluster cases (Table 3). It was also the only independent risk factor for being in medium clusters (Table 4).
To investigate the potential microbial factors involved in clustering, a microarray-based comparative genomic hybridization was performed to identify LSPs in the genomes of an isolate belonging to a large cluster (SA201) and a non-clustered isolate (SA178). The microarray hybridization followed by PCR and DNA sequencing, to determine the exact location of the deletions or sequence variations, identified 12 different LSPs in the genomes of SA201 and SA178 (Table 3). These 12 LSPs ranged in size from 21 bp to 17,793 bp and included six deletions, three IS6110 insertion-mediated deletion events, one repeat of an adjacent region, one gene replacement event, and one variation in a portion of the gene sequence. Four of the twelve LSPs were exactly the same in the two isolates. These four LSPs affected 16 genes encoding two PPE, a PE_PGRS, four transposases, a lipoprotein, a probable conserved pro-, gly-, val-rich secreted protein, a serine-esterase, and six hypothetical proteins. There were three and five LSPs unique to SA201 and SA178, respectively (Table 4). However, two of these unique LSPs (one in SA201 and one in SA178) involved the same genomic region but the deletion was not exactly the same in the two isolates (Table 3).
Although the incidence of recently transmitted TB as well as reactivation disease has been decreasing in Arkansas, the major fraction of the decrease has been due to a decrease in the incidence of reactivated infections . These findings highlight the need to know more about the factors involved in active transmission. Host risk factors for clustering identified in this study included non-Hispanic black race and age < 65 years. Furthermore, a significantly higher proportion of large cluster cases were non-Hispanic black, homeless, less than 65 years old, male sex, smear-positive sputum, excessive use of alcohol, smear-positive, and HIV sero-positive, compared to cases in small clusters all diagnosed within one year. However, being non-Hispanic black and homeless within the past year were the only two host characteristics that were identified as independent risk factors for being in large clusters, representing current transmission. Comparing the genomic content of one of the large cluster strains to that of a non-clustered strain from the same community identified 25 genes that differed between the two strains, potentially contributing to the observed differences in transmission.
Two characteristics are identified as risk factors for clustering of M. tuberculosis genotypes in our study, both of which, the non-Hispanic black race and younger age are among the previously known risk factors [8–11]. It is interesting to observe, in the current study, the differences in host risk factors when comparing clustered cases to non-clustered cases and large cluster cases to small cluster cases. Homelessness, a commonly known risk factor for clustering, was not significantly associated with small clusters; however, it was significantly associated with large clusters. It was found previously that M. tuberculosis genotype clusters in our study population represent both clusters resulted from current ongoing tuberculosis transmission, such as the large clusters, and clusters resulted from the reactivation of clusters of cases involved in remote tuberculosis transmission, which are more likely to be seen as small clusters [7, 14]. Our finding suggest that while being non-Hispanic black is a risk factor for both tuberculosis transmission and reactivation, being homeless mainly affect the chance for tuberculosis transmission. Our observation that social/demographic factors (e.g. being non-Hispanic black and homeless) are associated with large clusters, but clinical characteristics (sputum smear positivity, pulmonary cavitary disease, and HIV sero-positivity) are not, despite the inclusion of some large clusters resulting from ongoing outbreaks that might be caused by highly infectious TB cases in the analysis, suggests that social behavioral factors have a more important role in transmission of tuberculosis than does the infectiousness of the source.
Although previously documented and confirmed in this study that host risk factors can play an important role in TB transmission, the ability of M. tuberculosis to be transmitted from host to host is not well understood. Epidemiologic studies have observed that some strains are more successful in transmission than others [9, 15–17]. A large cluster of TB could be explained if the infecting strain has a higher probability of transmission or a higher probability of infection progressing to disease. The mycobacterial cell envelope contains immunomodulatory molecules that are important determinants of intracellular survival and virulence . Two of the genes (MT1800 and MT1802) affected by an LSP in strain SA201, the more widely transmitted strain, but not SA178, the less successful strain encode proteins that influence properties of the cell envelope . MT1802 encodes a membrane protein of the MmpL family. Although the function of this member of the MmpL family has not been studied, these membrane proteins are thought to have a function in the transport of lipids across the cell membrane, affecting the structure of the cell envelope . Six of the 16 genes absent in strain SA201, but present in SA178, were of the PE/PPE gene family. PE/PPE family genes are thought to play a role in maintenance of the latent state through antigenic variation . Disruption of these genes may decrease the available repertoire of antigens available to the M. tuberculosis strain, decreasing its ability to remain in a latent state. The other important LSP found in this study was the presence of one of the M. tuberculosis lipase-encoding genes, lipR (Rv3084) in strain SA201 and absent in strain SA178.
M. tuberculosis lipases comprise a diverse class of enzymes that are involved in lipid metabolism and may, therefore, have an important role in tuberculosis pathogenesis. Recently, as a follow-up to our microarray findings, Sheline and coworkers explored the association of LSP in lipR with patient characteristics using a population-based sample of 665 clinical isolates and found that DNA fingerprinting-clustered cases infected with a lipR LSP isolate were more often epidemiologically linked than clustered cases infected with a lipR wild-type isolate . This finding suggests the usefulness of the genomic comparison conducted in the present study. Further studies are needed to investigate whether the presence or absence of any of these 25 genes is associated with large clusters of tuberculosis cases. This will require a larger number of large cluster strains than is present in ourM. tuberculosis collection. The 25 genes that differ between the large cluster strain SA201 and the non-clustered strain SA178 identified in this study can serve as a basis for additional functional studies or population-based molecular epidemiologic studies that examine the association of these genetic changes with the ability of M. tuberculosis to cause persistent clusters of disease.
This study was supported by the National Institutes of Health grant NIH-R01-AI151975 and the interagency agreement 98FED10318 between the Veterans Administration, the Centers for Disease Control and Prevention and the Arkansas Department of health. The authors thank all of the colleagues in Dr. Zhenhua Yang’s laboratory for helpful discussions during the laboratory investigation and acknowledge Larissa Andersen’s assistance in verifying cluster assignment.
Conflict of interest: None declared.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.