|Home | About | Journals | Submit | Contact Us | Français|
Rationale: The contribution of interferon-γ release assays (IGRAs) to appropriate risk stratification of active tuberculosis suspects has not been studied.
Objectives: To determine whether the addition of quantitative IGRA results to a prediction model incorporating clinical criteria improves risk stratification of smear-negative–tuberculosis suspects.
Methods: Clinical data from tuberculosis suspects evaluated by the San Francisco Department of Public Health Tuberculosis Control Clinic from March 2005 to February 2008 were reviewed. We excluded tuberculosis suspects who were acid fast–bacilli smear–positive, HIV-infected, or under 10 years of age. We developed a clinical prediction model for culture-positive disease and examined the benefit of adding quantitative interferon (IFN)-γ results measured by QuantiFERON-TB Gold (Cellestis, Carnegie, Australia).
Measurements and Main Results: Of 660 patients meeting eligibility criteria, 65 (10%) had culture-proven tuberculosis. The odds of active tuberculosis increased by 7% (95% confidence interval [CI], 3–11%) for each doubling of IFN-γ level. The addition of quantitative IFN-γ results to objective clinical data significantly improved model performance (c-statistic 0.71 vs. 0.78; P < 0.001) and correctly reclassified 32% of tuberculosis suspects (95% CI,11–52%; P < 0.001) into higher-risk or lower-risk categories. However, quantitative IFN-γ results did not significantly improve appropriate risk reclassification beyond that provided by clinician assessment of risk (4%; 95% CI, −7 to +22%; P = 0.14).
Conclusions: Higher quantitative IFN-γ results were associated with active tuberculosis, and added clinical value to a prediction model incorporating conventional risk factors. Although this benefit may be attenuated within highly experienced centers, the predictive accuracy of quantitative IFN-γ levels should be evaluated in other settings.
The role of interferon-γ release assays (IGRAs) in the evaluation of active tuberculosis suspects is controversial. To date, whether IGRAs improve classification of smear negative tuberculosis suspects into clinically relevant risk categories has not been examined.
Quantitative interferon-γ levels measured by QuantiFERON-TB Gold improves risk stratification of smear-negative active tuberculosis suspects when added to objective clinical and demographic risk factors. However, this benefit is attenuated when the judgment of experienced clinicians is also considered.
Interferon-γ release assays (IGRAs) are in vitro immuno-diagnostic tests that measure effector T cell mediated interferon (IFN)-γ response to Mycobacterium tuberculosis–specific antigens. IGRAs are as sensitive and more specific than the tuberculin skin test for detecting latent tuberculosis infection (LTBI) (1, 2) and have better correlation with gradient of M. tuberculosis exposure (3–8). In 2005, the Centers for Disease Control and Prevention recommended that QuantiFERON TB-Gold (QFT-G; Cellestis, Carnegie, Australia), the first FDA-approved, commercially available IGRA to experience widespread use, could be used for targeted screening of LTBI in all circumstances in which the tuberculin skin test (TST) is used (9).
Although the advantages of IGRAs in diagnosing LTBI are well established, their role in evaluating active tuberculosis suspects remains unclear. IGRAs have variable, although often suboptimal, sensitivity and specificity for diagnosing active tuberculosis (1, 2, 10–16). To date, with the exception of studies examining these assays in parallel with the TST (11, 17), IGRAs have not been considered in light of conventional risk factors for active disease. In addition, whether IGRAs improve prediction of individual patients' risk for active tuberculosis has not been examined.
Acid fast bacilli (AFB) smear-positive tuberculosis suspects can often be triaged with relative ease. However, in suspects whose sputa or other tissue are smear-negative for AFB, clinicians use demographic and clinical risk factors, symptoms, and chest radiograph findings to classify patients into low-, intermediate-, or high-risk categories for active tuberculosis. Patients classified as high risk are typically initiated on antituberculosis therapy, whereas treatment is withheld for low-risk patients. In this study, we use novel risk reclassification methods (18) to assess whether addition of quantitative IFN-γ response measured by QFT-G (Cellestis) to routine clinical evaluation improves risk stratification of individuals suspected of having smear-negative pulmonary and extrapulmonary tuberculosis.
Some of the results of these studies have been previously reported in abstract form (19).
The San Francisco Department of Public Health (SFDPH) operates a central Tuberculosis Control Clinic that routinely screens contacts, immigrants and refugees, as well as hospitalized, private, and community health center patients for LTBI and active tuberculosis in accordance with American Thoracic Society, Centers for Disease Control and Prevention, and Infectious Diseases Society of America guidelines (20). The target population for this study includes AFB smear-negative pulmonary or extrapulmonary tuberculosis suspects who presented to the SFDPH Tuberculosis Control Clinic between March 2005 and February 2008 and had QFT-G performed as part of their initial evaluation. Patients with QFT-G results that were (1) indeterminate, (2) performed more than 14 days before or 14 days after their initial clinic visit, or (3) performed more than 7 days into a course of tuberculosis treatment, were excluded. In addition, patients younger than 10 years of age (where adult-type, nonpaucibacillary disease is uncommon) (21, 22), with a positive AFB smear examination, a known diagnosis of active tuberculosis at presentation, known HIV-infection, or with a final diagnosis of culture-negative tuberculosis, were excluded. Demographic and clinical information was extracted from the SFDPH Tuberculosis Control Clinic electronic database. QFT-G assays were performed at the SFDPH laboratory according to the manufacturer's instructions (23). Patients were considered to have active tuberculosis only when there was culture confirmation of M. tuberculosis. The study protocol was approved by the Committee for Human Research at the University of California, San Francisco.
The analysis included the following steps. First, a novel model selection procedure, the deletion/substitution/addition (DSA) algorithm (24), was used to select the optimal prediction model for culture-confirmed tuberculosis using standard clinical and demographic variables. Covariates were considered for inclusion in the model based on previous studies of risk factors for active tuberculosis and included the following: age, sex, foreign birth, homelessness, contact with an active tuberculosis case, previous history of active tuberculosis, predisposing medical condition (diabetes mellitus, silicosis, cancer, or condition requiring use of immunosuppressive medications), clinical symptoms of active tuberculosis (night sweats, weight loss, or cough), and findings on initial chest radiograph. We also performed a secondary analysis in which clinician suspicion for active disease at the time of patient evaluation (classified as low, intermediate, or high) was added to the baseline clinical prediction model generated by DSA.
Second, patients were classified as low (<5%), intermediate (5–20%), or high risk (>20%) for active tuberculosis based on the probability assigned by the baseline clinical prediction model (this classification was distinct from clinician suspicion for active disease described above). The lower and upper probability cut points for tuberculosis risk categories were selected on the basis of the assumption that empiric tuberculosis treatment would be withheld when the probability of active tuberculosis was below the lower-risk threshold (low risk) and prescribed when the probability was above the higher-risk threshold (high risk). Sensitivity analyses were performed using alternate low- and high-risk thresholds of 2.5 and 10% and 10 and 30%.
Third, quantitative IFN-γ results were added to the clinical prediction model. Performance of the prediction models with and without quantitative IFN-γ results were then compared using receiver-operator characteristic analysis (25) and net reclassification index (NRI) (18). Based on prespecified risk thresholds, the NRI reflects the net proportion of patients with culture-positive tuberculosis reclassified into a higher-risk category, plus the net proportion of patients without culture-positive tuberculosis reclassified into a lower-risk category (NRI = [P(up|D = 1) − P(down|D = 1)] − [P(up|D = 0) − P(down|D = 0)]). Final estimates of NRI were obtained using 10-fold cross-validation. Bootstrap confidence intervals for the NRI estimate are reported based on 1,000 resampling iterations.
All P values were two-sided with α = 0.05 as the significance level. All analyses were performed using Stata 10 (Stata Corporation, College Station, TX) and R, version 2.8.1 (R Project for Statistical Computing, http://cran.r-project.org).
Of 1,000 active tuberculosis suspects who had a QFT-G performed as part of their evaluation, 660 were included in the analysis (Figure 1). Of the 660 suspects, 630 (95%) had sputa and 30 (5%) had other tissue sent for AFB smear and culture as part of their diagnostic evaluation. Sixty-five (10%) patients were ultimately diagnosed with culture-confirmed tuberculosis, of whom 14 (22%) had extrapulmonary tuberculosis. Median IFN-γ level was similar in patients with pulmonary and extrapulmonary disease (1.1 IU/ml vs. 1.0 IU/ml; P = 0.89). The study population was predominantly male and foreign-born. Cases were more likely than noncases to have weight loss, night sweats, and chest radiographs with evidence of active disease on presentation, and less likely to have a history of prior active tuberculosis (Table 1). Median IFN-γ level was significantly higher in patients with tuberculosis compared with those without (1.1 IU/ml vs. 0.37 IU/ml; P < 0.001), and higher IFN-γ levels were associated with increased odds of active tuberculosis (odds ratio [OR], 1.07; 95% CI, 1.03–1.11) for each doubling of IFN-γ level. For example, a patient with a quantitative IFN-γ result of 10 IU/ml had a 41% (95% CI, 16–66%) increased odds of active tuberculosis relative to a patient with test results at the manufacturer-recommended cut point of 0.35 IU/ml. Eighty-five percent of cases had quantitative IFN-γ results in the upper three quintiles of IFN-γ concentration (≥0.23 IU/ml), whereas only 6% of cases were in the lowest quintile (<0.04 IU/ml) (Table 1). Sensitivity and specificity of QFT-G for active tuberculosis at the manufacturer-recommended cut point were 72 and 47%, and positive and negative predictive values were 13 and 89%, respectively.
A tuberculin skin test was performed in 117 (18%) patients before QFT-G measurement. There was no difference in the proportion of patients with culture-confirmed tuberculosis among those who did and did not have a tuberculin skin test performed before QFT-G (P = 0.41).
The baseline prediction model including objective demographic and clinical predictors classified 182 (28%) patients into low-risk, 407 (62%) into intermediate-risk, and 71 (11%) into high-risk categories. The presence of new infiltrate, pleural effusion, or lymphadenopathy on chest radiograph was most predictive of active tuberculosis (Table 2).
The addition of quantitative IFN-γ results to the baseline prediction model, including demographic and clinical predictors, significantly improved model accuracy (area under the curve [AUC], 0.71 [0.64–0.77] vs. 0.78 [0.73–0.84]; P < 0.001) (Table 2) and 32% (95% CI, 11–52%; P < 0.001) of tuberculosis suspects were appropriately reclassified into higher- or lower-risk categories (Table 3). In comparison to the clinical model alone, both case reclassification (11 more cases classified as high risk and 1 fewer as low risk) and noncase reclassification (88 more noncases designated as low risk and no more noncases classified as high risk) were improved. Results were similar when alternate thresholds were used to define risk categories (see Table E1 in the online supplement). Findings on chest radiograph remained the strongest predictor of active tuberculosis.
We performed a secondary analysis to determine whether quantitative IFN-γ levels improved risk reclassification beyond a prediction model that includes clinician suspicion. First, we evaluated whether a similar benefit in risk reclassification occurred when clinician suspicion, rather than quantitative IFN-γ level, was added to the baseline prediction model including objective demographic and clinical data. When clinician suspicion was added to the baseline model, accuracy increased (AUC, 0.71; 95% CI, 0.64–0.77 vs. 0.82; 95% CI, 0.77–0.88; P < 0.001) and 45% of tuberculosis suspects (95% CI, 23–80%; P < 0.001) were appropriately reclassified into higher- or lower-risk categories (data not shown). Next, the addition of quantitative IFN-γ results to this expanded model, including clinician suspicion, significantly increased accuracy (AUC, 0.82; 95% CI, 0.77–0.88 vs. AUC, 0.86; 95% CI, 0.81–0.91; P = 0.02), but not net reclassification index (NRI, 4%; 95% CI, −0.07 to 0.22; P = 0.14). Improved prediction among tuberculosis cases was outweighed by worse performance among noncases (Table 4). The addition of QFT-G results at the manufacturer-recommended cut point of 0.35 IU/ml in place of quantitative IFN-γ levels did not materially affect results obtained in either the primary or secondary analysis. To further explore performance in cases and noncases, we examined individual patients' risk before and after quantitative IFN-γ level was added to the model. The majority of culture-proven cases showed an appropriate increase in predicted risk with addition of quantitative IFN-γ results (Figure 2A). However, both decreased (appropriate) and increased (inappropriate) risk prediction was common among noncases (Figure 2B).
In this study, we found that quantitative IFN-γ results significantly improved risk stratification of smear-negative pulmonary and extrapulmonary tuberculosis suspects when added to objective clinical and demographic risk factors. However, this benefit in prediction became attenuated when clinician suspicion was taken into account. These findings indicate that IFN-γ levels obtained from QFT-G, at either the manufacturer-recommended cut point or as a quantitative measure, are unlikely to influence clinical management of active tuberculosis suspects attending highly experienced tuberculosis centers in low-incidence settings.
Risk prediction has long been used in the cardiovascular (26, 27) and cancer (28) literature to improve precision of diagnoses and inform decisions about treatment. Published literature to date assessing IGRA performance has been limited to considerations of sensitivity, specificity, and predictive value, although these measures alone do not describe the predictive accuracy of these assays or the extent to which they improve on readily available clinical information (29). In the absence of an established risk prediction model for AFB smear-negative tuberculosis, we used the DSA routine (24) to identify the optimal prediction model. This state-of-the-art procedure considers nonlinear terms and all possible interactions between predictors. Simultaneously, DSA avoids model overfitting through repeated cross validation. The models generated in this study demonstrate moderate to good discrimination, similar to the Framingham Risk Score for prediction of mortality from coronary heart disease (27, 30).
Previous studies examining quantitative QFT-G results have shown improved sensitivity when using cut points lower than those suggested by the manufacturer (14, 31, 32). However, cut points selected from AUC analysis are influenced by disease prevalence in the population being studied, give equal weight to false-positive and false-negative test results, and may misclassify individuals whose test result falls near the selected cut points (33). Our analyses incorporated IFN-γ levels as a continuous measure, reported diagnostic benefit in light of conventional risk factors, and used novel reclassification methods that allow QFT-G results to be considered in the context of standard clinical decision making. Our overall conclusions weigh the net reclassification results more heavily than improvements in discrimination represented by increases in AUC. Although broadly used as a summary measure of test performance, the area under the receiver-operator characteristic curve (AUC) does not focus on actual risk probabilities and their relation to clinical decision making, and is thus limited in its clinical relevance and use for evaluating risk prediction models (29, 34).
The changes in predicted risk of active tuberculosis following consideration of quantitative IFN-γ results were not uniform. Among intermediate and high-risk patients who eventually ruled out for active tuberculosis, the addition of quantitative IFN-γ results led to clinically significant decreases in risk probabilities whether or not clinical suspicion was also included in the prediction model. These findings support previous work emphasizing a high negative predictive value for QFT-G (11). However, approximately one-quarter of low-risk suspects who were eventually ruled out for tuberculosis were inappropriately reclassified as intermediate risk after consideration of quantitative IFN-γ results. The possibility that quantitative IFN-γ results have increased clinical utility in intermediate and high-risk tuberculosis suspects warrants further study.
Our study has several limitations. First, net reclassification index results depend heavily on both the base prediction model and choice of risk categories. We recognize that addition of IFN-γ to suboptimal base models could produce large improvements in both discrimination and risk reclassification. We used novel methods to optimize our prediction models, and their performance compares well with other well-accepted risk-prediction models (27, 30). In addition, our risk cut points were prespecified, and sensitivity analyses of alternate cut points did not modify our findings. Second, clinician suspicion, as used in our expanded clinical model, could have been influenced in some cases by QFT-G results. This is unlikely to have materially affected our analysis as 85% of all QFT-G results were not available at the time of clinical evaluation and quantitative IFN-γ results are not reported by the SFDPH laboratory. The dramatic improvement in model performance with the addition of clinician suspicion, however, indicates that crucial information is obtained in the workup process beyond our measured covariates. Prospective studies should attempt to better define these factors. Third, the test characteristics of QFT-G In-Tube (QFT-G-IT), the most recent generation of this assay, may differ from QFT-G as used in this study. Lastly, our analysis is most relevant to tuberculosis referral centers with experienced clinicians operating in low-incidence settings.
In conclusion, quantitative IFN-γ results obtained from QFT-G improved clinical evaluation of tuberculosis suspects compared with objective criteria. But in our highly experienced tuberculosis control clinic, subjective assessment of risk by clinicians performed even better. Further studies are needed to examine whether quantitative IGRA results have benefit beyond routine clinician evaluation in other settings.
The authors thank the staff at the San Francisco Department of Public Health, Tuberculosis Control Section.
Supported by grants T32 HL007185 (J.Z.M.), K23 HL094141 (A.C.), R01 AI034238 (P.C.H.), and K23 HL092629 (P.N.) from the National Institutes of Health.
This article has an online supplement, which is accessible in this issue's table of contents at www.atsjournals.org
Originally Published in Press as DOI: 10.1164/rccm.200906-0981OC on October 1, 2009
Conflict of Interest Statement: J.Z.M. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. A.C. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. E.V. has received consultancy fees from Kronos Longevity Research Institute, NPS Pharmaceuticals, Tethys Biosciences ($1001–$5000), and Zelos Therapeutics ($5001–$10,000). He has also received royalties from Springer-Verlag ($5001–$10,000). C.H. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. J.G. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. P.C.H. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. L.M.K. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. P.N. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript.