|Home | About | Journals | Submit | Contact Us | Français|
BackgroundPreclinical studies and Monte Carlo simulations have suggested that there is a relatively limited role of adherence in acquired drug resistance (ADR) and that very high levels of nonadherence are needed for therapy failure. We evaluated the superiority of directly observed therapy (DOT) for tuberculosis patients vs self-administered therapy (SAT) in decreasing ADR, microbiologic failure, and relapse in meta-analyses.
MethodsProspective studies performed between 1965 and 2012 in which adult patients with microbiologically proven pulmonary Mycobacterium tuberculosis were separately assigned to either DOT or SAT as part of short-course chemotherapy were chosen. Endpoints were microbiologic failure, relapse, and ADR in patients on either DOT or SAT.
ResultsTen studies, 5 randomized and 5 observational, met selection criteria: 8774 patients were allocated to DOT and 3708 were allocated to SAT. For DOT vs SAT, the pooled risk difference for microbiologic failure was .0 (95% confidence interval [CI], −.01 to .01), for relapse .01 (95% CI, −.03 to .06), and for ADR 0.0 (95% CI, −0.01 to 0.01). The incidence rates for DOT vs SAT were 1.5% (95% CI, 1.3%–1.8%) vs 1.7% (95% CI, 1.2%–2.2%) for microbiologic failure, 3.7% (95% CI, 0.7%–17.6%) vs 2.3% (95% CI, 0.7%–7.2%) for relapse, and 1.5% (95% CI, 0.2%–9.90%) vs 0.9% (95% CI, 0.4%–2.3%) for ADR, respectively. There was no evidence of publication bias.
ConclusionsDOT was not significantly better than SAT in preventing microbiologic failure, relapse, or ADR, in evidence-based medicine. Resources should be shifted to identify other causes of poor microbiologic outcomes.
Tuberculosis treatment with short-course chemotherapy has 3 aims: rapid bactericidal activity, which is measured by sputum conversion; sterilizing activity, which is measured by relapse; and suppression of acquired drug resistance (ADR). The World Health Organization's (WHO) DOTS (directly observed therapy, short-course) program was developed to ensure success of this chemotherapy. DOTS has 5 components: political commitment by governments, improved laboratory services, a continuous supply of good-quality drugs, documentation of individual patients’ success and program progress toward set targets, and direct observation by a healthcare worker of each patient swallowing pills (ie, directly observed therapy [DOT]). Historical trends of the decline of multidrug-resistant tuberculosis rates with implementation of the program, especially the dramatic reports from New York City and other large cities, provided powerful examples of the success of the program [1–4]. DOT, the namesake and heart of the program, is the most expensive [5–7]. However, DOT is considered by the World Bank to be one of the “most cost-effective of all health interventions, and indispensable to preventing ADR and relapse” [8, 9]. The several studies that were pivotal to the adoption of the DOTS program were retrospective, or employed quasi-experimental designs, and often emphasized the benefit of program-defined treatment outcomes [1–4, 8–12]. They did not tease out the effect of DOT from other program components. In contrast, one meta-analysis of prospective studies found no major benefit of DOT compared to self-administered therapy (SAT) for program-defined outcomes such as “cure” and “completion of treatment” in both active and latent tuberculosis . In another systematic review, there was also no significant benefit for the outcome of recurrence . However, in some high-burden countries such as in South Africa, up to 77% of recurrence is due to new infection and not relapse [15, 16].
Because DOT is now the accepted standard of care everywhere, performance of randomized controlled trials in which some patients are randomized to SAT or DOT or placebo pills to see if ADR emerges more easily would be unethical . To address this limitation, we recently performed hollow-fiber studies in which various degrees of nonadherence were examined during both bactericidal and sterilizing effect . Surprisingly, microbiologic failure occurred only when >60% of doses were missed, but no ADR was encountered. Thus, we hypothesize that DOT has no impact on rates of sputum conversion, ADR, or relapse in tuberculosis patients. To test that hypothesis, we performed a meta-analysis of prospective clinical studies that compared DOT to SAT and reported microbiologic outcomes. We were particularly interested in microbiologic outcomes as primary outcomes, as it is a standard tenant of infectious diseases therapeutics that the best evidence for eradication of pathogens or ADR, or relapse, is microbiologic demonstration , and not program factors such as “completion of therapy.”
We used WHO definitions . DOT refers to the practice of supervising tuberculosis patients swallowing all their pills over the entire course of treatment by trained health personnel who are accountable to tuberculosis control staff. SAT refers to unsupervised administration of prescribed antituberculosis drugs by patients. We defined partial DOT as the practice in which patients are on DOT for only portions of the therapy duration. Defaulting refers to missing a cumulative ≥2 months of doses after initially taking at least 1 month's worth of medication. Patients reported as lost to follow-up by randomized clinical trials were included in the defaulting category. Microbiologic failure refers to positive smear microscopy or culture at the fifth month or later on therapy. Patients who had their treatment changed for persistent bacteriologic positivity or because of radiologic and/or clinical deterioration, including those with “doubtful responses,” were classified as having failed treatment. ADR was defined as new or additional resistance to 1 or more of the first-line antituberculosis drugs among failures or relapses. Relapse was when a patient was declared cured but subsequently developed microbiologically proven disease . Molecular genotyping of repeat isolates was not performed.
We searched PubMed, Embase, ISI Web of Science, and the Cochrane Library for studies published between 1 January 1965 and 31 December 2012. There was no exclusion of articles by language. Bibliographies of original articles, key reviews, and consensus statements were also searched for additional relevant studies [8, 10, 13, 14]. The following Medical Subject Heading terms and strategy was used: directly observed therapy OR supervised therapy OR directly observed treatment strategy OR DOT OR DOTS AND self-administered therapy OR self-supervised therapy OR unsupervised therapy AND tuberculosis. In addition, we also searched for articles in the gray literature at Inside Conferences, clinicaltrials.gov, and Open Grey (System for Information on Grey Literature in Europe; http://www.opengrey.eu).
Inclusion criteria were prospective studies in which patients were diagnosed by microscopic examination of sputum smear or culture and were separately assigned to either DOT or SAT, treatment using a short-course chemotherapy regimen that includes isoniazid, rifampin, and pyrazinamide and evidence of evaluation for microbiologic failure. Studies were limited to prospective data from observational studies or controlled trials with concurrent controls. We excluded retrospective studies to avoid selection and information biases, studies carried out in children, studies that used retreatment regimens, and treatment in patients with a prior history of tuberculosis.
Study selection was done independently by the 2 investigators. Reviewer agreements were measured using the κ statistic. The quality of each trial was graded by use of validated scores . Disagreements were resolved by consensus.
The primary outcome was microbiologic failure. The secondary outcomes were ADR, relapse, and default.
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines .
We quantified heterogeneity of effect using the I2 statistic [23, 24]. We calculated the incidence rate (IR) and 95% confidence intervals (CIs) for DOT or SAT, for each study for each of the outcomes based on the number of events reported in each original study. We also computed for a second effect size measure, which is the risk difference (RD). This was used because several cells had zero outcomes events, which makes it difficult to calculate relative risk (RR) without imputation of data or excluding studies. However, all 3 effect sizes were reported, with IR and RD considered the primary. To permit unbiased comparison of outcome, we employed an “intention to treat” strategy (ie, by original assigned treatment groups, irrespective of whether treatment was subsequently changed), except when not stated by the primary study, when we analyzed outcomes as all patients randomized . We decided a priori to use the DerSimonian and Laird random methods to pool effect size across studies, as these methods would provide more conservative CIs [23, 25, 26]. Fixed-effects models were used to pool effect size if there was no significant heterogeneity (ie, I2 ≤ 50%); otherwise, mixed-effects models were used for I2 > 50%. We employed mixed-effects models, in which random-effects analyses were used to combine IR of groups within each study, using Comprehensive Meta-Analysis software (Biostat Inc, Englewood, New Jersey). Study-to-study variance (T2) was not pooled across studies; however, it was computed within groups and was not assumed to be the same for all groups. Publication bias and small study effects were systematically evaluated by visual inspection for funnel plot asymmetry and by use of the Egger test [23, 26].
First, we examined the effect of removing one study at a time on effect size for microbiologic failure, ADR, and relapse. Second, we examined the effect of study design (randomized controlled trials vs observational studies) on effect size. Third, we examined whether combining all patients classified as partial DOT with either DOT or SAT led to significant changes in effect size. Fourth, we examined the role of study locale (rural patients vs urban patients) on effect size. Fifth, we examined the effect of study quality score on effect size.
To further explore potential source of heterogeneity, we performed meta-regression analyses in which study design and study locale were simultaneously examined as covariates. Random-effects meta-regression was utilized; we expected some unexplained or “residual” heterogeneity. The weight for each trial was equal to the inverse of the sum of the within-trial variance and the residual between-trial variance, in order to correspond to a random-effects analysis. An iterative method providing restricted maximum likelihood estimates of regression parameters, their asymptotic variance, and the residual heterogeneity variance was performed in Stata version 12.
Ten of 129 initially identified studies (8%) met selection criteria [5, 27–35], as shown in Figure Figure1.1. The κ value was 0.92 for the inclusion of studies and 0.90 for the rating of trials on considered methodologic aspects. There were 5 randomized controlled trials and 5 observational studies. The characteristics of included studies are shown in Table Table1,1, as is the quality score for each study, which demonstrates that all 10 were good-quality studies. The combined number of participants enrolled in the selected studies was 13 752. From these, 13 112 (95%) participants were assigned or randomized to intervention: 8774 (67%) to DOT, 630 (5%) to partial DOT, and 3708 (28%) to SAT. Thus, the proportion of patients who received partial DOT was small, and this group was excluded from further computation of effect size.
Significant heterogeneity of effect was observed in the 9 of 10 studies that reported defaulting as an outcome (I2 = 68%; P = .02); therefore, mixed effects models were employed. Results are shown in Figure Figure2.2. SAT (n = 3192) had worse defaulting than DOT (n = 8269), based on pooled RD of −0.05 (95% CI, −.07 to −.04; Figure Figure2).2). The pooled IR was 19.4% (95% CI, 18.0%–21.0%) on SAT vs 8.8% (95% CI, 6.1%–9.5%) on DOT (Table (Table2).2). If we calculated RR by omitting studies with zero cells, the pooled RR was 0.48 (CI, .43–.54), confirming that whichever one of the 3 effect sizes was utilized, DOT was associated with lower defaulting rates compared to SAT.
For microbiologic failure, 10 studies randomized patients to either SAT (n = 3376) or DOT (n = 8625). The combined I2 was 0%, indicating no significant heterogeneity. Therefore, fixed-effects models were utilized. The pooled RD for patients on DOT vs SAT was 0.0 (CI, <−.01 to .01; Figure Figure3).3). The results held true regardless of whether only randomized controlled trials were considered or observational studies were added (Figure (Figure3).3). No single study demonstrated a significantly higher risk with SAT compared to DOT. The IR was 1.5% (95% CI, 1.3%–1.8%) on DOT vs 1.7% (95% CI, 1.2%–2.2%) on SAT (Table (Table3).3). Moreover, the pooled RR for failure on DOT vs SAT was 1.20 (CI, .81–1.78). No significant small study effects or publication bias was observed based on the Egger test and funnel plot examination (Figure (Figure44).
Three studies reported relapse [27, 29, 34]. The studies had significant heterogeneity (I2 = 68%); therefore, random-effects models were utilized. The pooled RD for relapse on SAT (n = 649) compared to DOT (n = 649) was 0.01 (95% CI, −.03 to .06; Figure Figure5);5); the IR was 3.7% (95% CI, .7%–17.6%) on DOT vs 2.3% (95% CI, .7%–7.2%) on SAT (Table (Table3).3). The pooled RR was 1.49 (95% CI, 0.31–7.19) for DOT compared to SAT. There was no significant publication bias or small study effects observed (Figure (Figure66).
The 2 ADR studies were heterogeneous (I2 = 69%). The pooled RD was 0.0 (95% CI, −.01 to .01) when DOT (n = 415) was compared to SAT (n = 532; Figure Figure7);7); the IR was 1.5% (95% CI, .2%–9.0%) for patients on DOT and 0.9% (95% CI, .40%–2.30%) for patients on SAT (Table (Table3).3). The RR of ADR on DOT vs SAT was 1.40 (95% CI, .20–9.98).
In subgroup analysis, microbiologic failure for rural/urban studies was significantly higher on DOT compared to SAT (P = .045). The pooled RD for studies performed in urban locales was 0.004 (95% CI, −.016 to .008), whereas the RD from rural/urban studies was 0.004 (95% CI, .00–.009). This suggested that rural patients were more likely to fail on DOT compared to SAT. However, there were no studies performed solely in rural areas. No significant changes in pooled RD were encountered when we systematically removed 1 study at a time in influence analysis (Supplementary Figure 1). Next, we examined whether combining all patients classified as partial DOT with either DOT or SAT, or grouped studies by country (hence program quality), or by study design, led to significant changes in conclusions. There was no significant change in effect size for microbiologic failure or ADR or relapse, for all (Supplementary Figures 2–4).
For microbiologic failure, the percentage residual variation due to heterogeneity for a model comprising study design and study locale was 0% and the joint test for both covariates revealed a P = .34. The restricted maximum likelihood estimate for between-study T2 was 0. The RD for study design was 0.01 (95% CI, −.01 to .02), whereas that for study locale was −0.01 (95% CI, −.02 to .01). Thus the findings from the meta-regression demonstrate no other source of variation for the effect obtained, which suggests that there was no significant difference between SAT and DOT.
Well-documented decreases in ADR in several cities and countries have provided strong historical evidence of the success of DOTS, based on decreased defaulting rates [1–4, 8–12]. A prior analysis of Volmink and Garner, in a mixture of patients with latent and active tuberculosis, found that DOT was not superior to SAT for the program-defined outcomes of “completion of treatment” . We found that defaulting rates were indeed reduced by DOT. However, despite the poorer defaulting rates on SAT, we found no difference in microbial failure, ADR, or relapse, between DOT and SAT, similar to our findings in our previously published in vitro hollow-fiber studies . One possible potential explanation for the discrepancies with historical data is that those studies were retrospective, and those that were prospective employed quasi-experimental designs. In evidence-based medicine, the highest quality of scientific evidence comes from >1 properly randomized controlled trial, whereas the lowest quality is generally that of descriptive studies or opinions of authorities, whether or not there is consensus . Notably, no single study demonstrated a significantly higher risk of microbiologic failure with SAT compared to DOT. We speculate that the DOTS program is associated with a large infusion of resources such as upgrade in expertise and a reliable supply of drugs, and that the regular contact with a patient further provides a higher level of support apart from direct supervision of therapy, which would lead to apparent improvement in outcomes in retrospective studies, independent of DOT.
Our findings should not be read as questioning the entire DOTS program, but are limited to supervision of patients swallowing pills. Although the full program is often accompanied by an infusion of resources, the DOT component itself consumes an inordinate portion of that, which is a problem in resource-constrained settings . This may explain the suggested association between rural residence and microbiologic failure. We speculate that economic constraints were the most likely driver accounting for this observation. It may be that requiring patients to frequently come and pick up their medicines or to be observed swallowing their pills could actually impose economic hardships in some parts of the world, leading to microbiologic failure. Moreover, in some high-burden countries, baseline adherence rates measured using validated methods are already >97% on SAT , and there may be no room for further improvement in adherence with DOT. We propose that, instead, a concerted effort should be made to shift the resources toward the other possible reasons for such failure, beyond DOTS, including pharmacokinetic/pharmacodynamics and pharmacokinetic and microbial variability [18, 38]. However, the nature of the data reported precluded us from investigating the role of such factors in the current meta-analysis.
There are several limitations to our analyses. First, the WHO definitions we used, particularly for the secondary outcome of “defaulting,” are subject to different interpretations. Second, various DOT supervisors and various forms of DOT were employed by the selected studies, whereas some of the studies did not explicitly state whether DOT was for the initial 2 months of therapy only or for the entire treatment duration. Hence, these data are subject to misclassification bias, which can lead to erroneous failure to reject the null hypothesis . However, the influence and sensitivity analyses we performed did not reveal significant change in the pooled RD, suggesting that these findings are internally robust. Third, it has also been argued that the quality of DOTS programs has an impact on results of meta-analysis, and therefore analysis should be stratified by quality of program. However, we performed a stratified analysis by quality of DOTS program using country as a surrogate, and DOT was still no better than SAT. Fourth, differences in study design and the heterogeneity between studies could make our conclusions less reliable. As an example, it could be that less reliable patients were assigned to DOT whereas more reliable patients were assigned SAT in the observational studies, which would bias the results. However, analysis of randomized studies alone vs analysis that included observational studies did not alter the conclusions (Figure (Figure3).3). Fifth, ADR and relapse studies were fewer and these were of different study design. The single randomized clinical trial revealed higher risk for relapse with SAT compared to DOT when RR was calculated (RR, 1.74 [95% CI, .93–3.26]); however, it did not achieve statistical significance. For the observational studies, the pooled RR was 1.13 (95% CI, .02–54.91). These results were partly due to zero cells and the imputation strategies inherent with using RR as effect size. That is why our primary effect sizes were RD and IR, which require no such imputation. The differences by study design vanished when those effect sizes were employed. Finally, an inherent limitation of meta-analyses is that some influential studies may be missed during the search, thereby biasing the studies. However, we excluded no studies by publication language, examined the Cochrane database and the gray literature, and performed a manual search of references in key publications, in order to minimize bias.
In conclusion, our evidence-based medicine approach found that DOT was not superior to SAT in terms of microbiologic outcomes. Other causes of poor microbiologic outcomes should be sought in new studies.
Supplementary materials are available at Clinical Infectious Diseases online (http://cid.oxfordjournals.org/). Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.
Financial support.This study was funded by the National Institutes of Health (NIH; grant number R01AI079497).
The NIH was not involved in the design and conduct of the study; collection, management, analysis, and interpretation of data; and preparation, review, or approval of the manuscript.
Potential conflicts of interest.T. G. has been a consultant for Merrimack Pharmaceuticals and has received research grants from Pfizer, Merck, and Astellas Pharma for work on antifungal agents. J. G. P. reports no potential conflicts.
Both authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.