In this study, we found that quantitative IFN-γ results significantly improved risk stratification of smear-negative pulmonary and extrapulmonary tuberculosis suspects when added to objective clinical and demographic risk factors. However, this benefit in prediction became attenuated when clinician suspicion was taken into account. These findings indicate that IFN-γ levels obtained from QFT-G, at either the manufacturer-recommended cut point or as a quantitative measure, are unlikely to influence clinical management of active tuberculosis suspects attending highly experienced tuberculosis centers in low-incidence settings.
Risk prediction has long been used in the cardiovascular (26
) and cancer (28
) literature to improve precision of diagnoses and inform decisions about treatment. Published literature to date assessing IGRA performance has been limited to considerations of sensitivity, specificity, and predictive value, although these measures alone do not describe the predictive accuracy of these assays or the extent to which they improve on readily available clinical information (29
). In the absence of an established risk prediction model for AFB smear-negative tuberculosis, we used the DSA routine (24
) to identify the optimal prediction model. This state-of-the-art procedure considers nonlinear terms and all possible interactions between predictors. Simultaneously, DSA avoids model overfitting through repeated cross validation. The models generated in this study demonstrate moderate to good discrimination, similar to the Framingham Risk Score for prediction of mortality from coronary heart disease (27
Previous studies examining quantitative QFT-G results have shown improved sensitivity when using cut points lower than those suggested by the manufacturer (14
). However, cut points selected from AUC analysis are influenced by disease prevalence in the population being studied, give equal weight to false-positive and false-negative test results, and may misclassify individuals whose test result falls near the selected cut points (33
). Our analyses incorporated IFN-γ levels as a continuous measure, reported diagnostic benefit in light of conventional risk factors, and used novel reclassification methods that allow QFT-G results to be considered in the context of standard clinical decision making. Our overall conclusions weigh the net reclassification results more heavily than improvements in discrimination represented by increases in AUC. Although broadly used as a summary measure of test performance, the area under the receiver-operator characteristic curve (AUC) does not focus on actual risk probabilities and their relation to clinical decision making, and is thus limited in its clinical relevance and use for evaluating risk prediction models (29
The changes in predicted risk of active tuberculosis following consideration of quantitative IFN-γ results were not uniform. Among intermediate and high-risk patients who eventually ruled out for active tuberculosis, the addition of quantitative IFN-γ results led to clinically significant decreases in risk probabilities whether or not clinical suspicion was also included in the prediction model. These findings support previous work emphasizing a high negative predictive value for QFT-G (11
). However, approximately one-quarter of low-risk suspects who were eventually ruled out for tuberculosis were inappropriately reclassified as intermediate risk after consideration of quantitative IFN-γ results. The possibility that quantitative IFN-γ results have increased clinical utility in intermediate and high-risk tuberculosis suspects warrants further study.
Our study has several limitations. First, net reclassification index results depend heavily on both the base prediction model and choice of risk categories. We recognize that addition of IFN-γ to suboptimal base models could produce large improvements in both discrimination and risk reclassification. We used novel methods to optimize our prediction models, and their performance compares well with other well-accepted risk-prediction models (27
). In addition, our risk cut points were prespecified, and sensitivity analyses of alternate cut points did not modify our findings. Second, clinician suspicion, as used in our expanded clinical model, could have been influenced in some cases by QFT-G results. This is unlikely to have materially affected our analysis as 85% of all QFT-G results were not available at the time of clinical evaluation and quantitative IFN-γ results are not reported by the SFDPH laboratory. The dramatic improvement in model performance with the addition of clinician suspicion, however, indicates that crucial information is obtained in the workup process beyond our measured covariates. Prospective studies should attempt to better define these factors. Third, the test characteristics of QFT-G In-Tube (QFT-G-IT), the most recent generation of this assay, may differ from QFT-G as used in this study. Lastly, our analysis is most relevant to tuberculosis referral centers with experienced clinicians operating in low-incidence settings.
In conclusion, quantitative IFN-γ results obtained from QFT-G improved clinical evaluation of tuberculosis suspects compared with objective criteria. But in our highly experienced tuberculosis control clinic, subjective assessment of risk by clinicians performed even better. Further studies are needed to examine whether quantitative IGRA results have benefit beyond routine clinician evaluation in other settings.