|Home | About | Journals | Submit | Contact Us | Français|
Background.The interferon-γ release assays (IGRAs) are increasingly being used as an alternative to the tuberculin skin test (TST). Although IGRAs may have better specificity and certain logistic advantages to the TST, their use may contribute to overtesting of low-prevalence populations if testing is not targeted. The objective of this study was to evaluate the accuracy of a risk factor questionnaire in predicting a positive test result for latent tuberculosis infection using the 3 commercially available diagnostics.
Methods.A cross-sectional comparison study was performed among recruits undergoing Army basic training at Fort Jackson, South Carolina, from April through June 2009. The tests performed included: (1) a risk factor questionnaire; (2) the QuantiFERON Gold In-Tube test (Cellestis Limited, Carnegie, Victoria, Australia); (3) the T-SPOT.TB test (Oxford Immunotec Limited, Abingdon, United Kingdom); and (4) the TST (Sanofi Pasteur Ltd., Toronto, Ontario, Canada). Prediction models used logistic regression to identify factors associated with positive test results. RFQ prediction models were developed independently for each test.
Results.Use of a 4-variable model resulted in 79% sensitivity, 92% specificity, and a c statistic of 0.871 in predicting a positive TST result. Targeted testing using these risk factors would reduce testing by >90%. Models predicting IGRA outcomes had similar specificities as the skin test but had lower sensitivities and c statistics.
Conclusions.As with the TST, testing with IGRAs will result in false-positive results if the IGRAs are used in low-prevalence populations. Regardless of the test used, targeted testing is critical in reducing unnecessary testing and treatment.
Clinical Trial Registration.NCT00804713.
Universal screening for latent tuberculosis infection (LTBI) in the United States is no longer recommended; current practice favors a targeted approach. Centers for Disease Control and Prevention (CDC) guidelines recommend targeted testing of only persons with known risk factors for TB, specifically stating that “targeted tuberculin testing programs should be conducted only among groups at high risk and discouraged in those at low risk” [1 p. 1]. Similarly, the Institute of Medicine has called for the development of targeted TB screening programs based on epidemiologic risk analysis . Targeted testing offers logistic and efficiency advantages over universal screening and increases the positive predictive value (PPV) of a positive test result by selecting a higher-prevalence population for testing . In relatively homogeneous exposure situations, such as among immigrants, in prisons, and in hospitals, universal testing is still performed on the basis of the association with a high-risk setting, although health care workers are increasingly being recognized as having heterogeneous exposures . Targeted testing has been implemented and evaluated using predictive models in several such heterogeneous contexts, including in contact investigations [5, 6], among pediatric populations [7, 8], and among university entrants . These have all found that targeted testing may dramatically reduce the amount of testing without negative effects on disease control efforts.
The US military has performed universal testing of recruits entering military service since the 1960s . However, the US military has a low risk for active TB, with a rate of 0.65 cases of confirmed pulmonary TB per 100000 person-years from 1998 to 2007, a rate 84% lower than the age-adjusted US rate (J. D. Mancuso; Walter Reed Army Institute of Research; unpublished data). Because it is a population with heterogeneous exposures to TB prior to accession (ie, entry into military service), the US military is challenged with mitigating the risk of TB among higher-risk recruits without exposing low-risk recruits to unnecessary therapy. Targeted testing relies on identifiable risk factors and an assessment tool that can accurately predict LTBI. Interferon-γ release assays (IGRAs), including the QuantiFERON® Gold In-Tube test (QFT) (Cellestis Limited, Carnegie, Victoria, Australia) and the T-SPOT®.TB test (T-Spot) (Oxford Immunotec Limited, Abingdon, United Kingdom), are thought to be more specific and to have other logistic advantages over the tuberculin skin test (TST) (Tubersol®, Sanofi Pasteur Ltd., Toronto, Ontario, Canada) . The purpose of this study is to provide an evaluation of the accuracy of a risk factor questionnaire (RFQ) in predicting positive TST, QFT, and T-Spot results. This is used to compare the effectiveness of targeted testing strategies using the 3 commercially available diagnostic tests for LTBI.
The study was approved in 2009 by the Uniformed Service University Institutional Review Board and was conducted in the same year. Study procedures included obtaining informed consent of all subjects. This cross-sectional comparison study among Army recruits at Fort Jackson, South Carolina, consisted of (1) the RFQ, (2) QFT, (3) the T-Spot, and (4) the TST. A total of 2017 subjects were enrolled in the study from 1 April to 11 June 2009.
The RFQ was developed from previous literature and validation studies [5, 6, 8–10, 12]. The RFQ was performed before all other TB testing, and subjects were encouraged to complete all fields. The RFQ took 3–5 minutes for participants to complete. The primary variables of interest included foreign birth, race and ethnic background, and contact with a case of active TB . Other factors included demographic characteristics, foreign residence or travel, exposure assessment, bacille Calmette-Guérin (BCG) vaccination, history of prior TB diagnosis or treatment, and prior positive skin test result, as shown in Table 1. Human immunodeficiency cirus (HIV) and other immunosuppressive conditions are disqualifying for entry into military service and therefore are not reported here. The TB prevalence reported by the World Health Organization in 1990 was used to estimate exposure risk in country of birth and during overseas travel or residence using groups of (1) <20 cases per 100000 persons, (2) 20–100 cases per 100000 persons, and (3) >100 cases per 100000 persons [13, 14].
Blood samples were obtained for QFT and T-Spot at the time of routine phlebotomy for recruit inprocessing. Personnel performing IGRA assays were blinded to all patient data. QFT was performed in accordance with package insert instructions, including incubation and centrifugation within the prescribed times . Testing was completed with the aid of a Triturus automated enzyme-linked immunosorbent assay workstation (Grifols USA). Blood samples were obtained for T-Spot into sodium heparin tubes and shipped overnight at room temperature to the Oxford Immunotec laboratory (Marlborough, MA). T-Spot was performed in accordance with the package insert instructions , except for the addition of 25 μl/mL T cell Xtend (Oxford Immunotec) immediately before peripheral blood mononuclear cell recovery, to increase the processing window from 8 hours up to 32 hours.
All personnel involved in placement and reading of the skin test findings were trained and monitored to strictly adhere to standard operating procedures on the basis of the published methods [17, 18]. The Mantoux technique was used to administer 0.1 mL (5 TU) of Tubersol tuberculin PPD (Sanofi Pasteur). The transverse diameter of induration at each skin test site was measured 2 days after administration.
SAS software, version 9.2 (SAS Institute), was used for all analyses. A positive TST result was defined using risk-stratified interpretation (RSI) criteria from published CDC guidelines . IGRA end points were defined by using established cutoffs from the manufacturer [15, 16]. The RFQ was used to develop predictive models for positive responses to TST and IGRA [5–9]. RFQ prediction models were developed independently for each of the three diagnostic tests. Factors associated with a positive result were evaluated using the Pearson χ2 test, the Fisher exact test, and unconditional logistic regression. An alpha level of .05 was used to identify significance in all statistical tests.
Sensitivity, specificity, PPV, and negative predictive value (NPV) were calculated for each RFQ variable response and for different combinations of variables. Missing data were rare (generally <1%), so no imputation techniques were necessary. Receiver operator curves were constructed by plotting sensitivity versus 1 - specificity for each probability level. Variables were selected for inclusion into the models on the basis of prior knowledge of risk factors for TB exposure and contribution to the predictive ability of the model. The contribution to the model was assessed by change in area under the curve (AUC), or c statistic, when adding each predictor variable to the model. To validate the prediction model, the analysis was performed on a second set of samples obtained by bootstrap methods [19, 20]. For this analysis, 1000 bootstrap samples of the same size as the original model were taken from the original data set with replacement. The AUC and estimates of the odds ratios were reported for the bootstrap validation data set estimates and compared with the original data set.
Of the 3095 recruits approached from 1 April to 11 June 2009, 2697 were eligible to participate in the study, and 2017 subjects (75%) enrolled (Figure 1). Thirty-eight recruits withdrew before blood collection or completion of skin testing; 30 of these withdrawals were for administrative reasons unrelated to the study. TST results were available for 1978 (99.9%) of the remaining 1979 participants, and T-Spot and QFT results were available for 1888 (95.4%) and 1835 (92.7%), respectively. For comparability between the prediction models, this analysis was limited to subjects who had positive or negative results for all 3 tests (n = 1783) and excluded subjects with an indeterminate or invalid result by any test.
Characteristics of study participants are shown in Table 1 and were similar to the overall recruit population. TST induration was detected in 105 participants (5.9%) and ranged from 2 to 80 mm. TST induration size of ≥10 mm was seen in 58 subjects (3.3%), but only 38 were positive by RSI criteria (2.1%). Of note, 15 of the 38 (39%) did not have any of the traditional risks used to stratify TST interpretation as defined by CDC, but had induration of 15 mm or greater. Similar proportions of positive results were seen for the IGRAs, with 34 positive T-Spot results (1.9%) and 36 positive QFT results (2.0%).
The unadjusted association of demographic and exposure risk factors with test results is shown in Table 1. Birth in a TB-endemic country had a particularly strong association with a positive test result, as did age, race, contact with a TB case, and other established risk factors [6, 8, 9, 21, 22]. The multivariate associations of selected factors with positive TST or IGRA results are shown in Table 2. After adjusting for the other variables in the model, significant associations were found between a positive test result and exposure to a TB case, TB prevalence in the country of birth, residence with a family member btorn outside of the United States, positive prior TST result, and residence in a congregate setting, such as homeless shelter, prison, or drug treatment facility. To account for possible overfitting, validation of the models with bootstrap estimates was performed. The estimates obtained via bootstrap were similar to those obtained in the original data set, although some predictors were no longer statistically significant, and the AUCs were moderately lower.
The characteristics of the RFQ models in predicting positive tests are shown in Tables 3, ,4,4, and and5.5. As expected, the sensitivity, NPV, and AUC of the RFQ were seen to improve with increasing numbers of predictors, with corresponding decreases in RFQ specificity and PPV. A 4-variable model for RFQ prediction of a positive TST result was selected as having the best bias-variance tradeoff, with a sensitivity of 79% (95% confidence interval [CI], 63%–90%), specificity of 92% (95% CI, 91%–93%), and AUC of 0.871. Because only 9.3% of subjects had a positive response to 1 of these 4 variables, targeted testing of only these positive results would be expected to reduce testing by 90.7% (95% CI, 89%–92%). In contrast, when all potential risk factors were included, 32.5% had at least 1 “positive” response, but this increased the sensitivity only slightly while dramatically lowering specificity.
Figure 2 compares the receiver operator curves for the performance of the full RFQ model for predicting positive results for the TST, QFT, and T-Spot. This graphically demonstrates the lower performance characteristics of the RFQ in predicting an IGRA outcome, compared with the TST. Tables 4 and and55 show the characteristics of the independently created prediction models using the same combinations of variables for the QFT and T-Spot as used in Table 3 for the TST. Although the specificities and NPVs were similar, the sensitivity, PPV, and AUC of the RFQ were all considerably lower in predicting a positive IGRA than a positive TST. For the 4-variable model, the RFQ had a sensitivity of 44% (95% CI, 28%–62%), specificity of 91% (95% CI, 90%–93%), and AUC of 0.684 in predicting a positive QFT. The RFQ had very similar estimates in predicting a positive T-Spot, with a sensitivity of 44% (95% CI, 27%–62%), specificity of 91% (95% CI, 90%–93%), and AUC of 0.688.
Risk factors for LTBI among US Army recruits were similar whether measured by the TST or 1 of the 2 commercially available IGRAs. RFQ prediction models were constructed using variables including birth in a country with a high prevalence of TB, close contact with an active TB case, history of living with a family member born outside the United States, and history of a prior positive TB skin test result. Use of these 4 variables resulted in 79% sensitivity, 92% specificity, and an AUC of 0.871 in predicting a positive TB skin test. Targeted testing of only those with a positive response to 1 of these 4 questions would reduce testing by >90%, increasing the efficiency of the testing program. Prediction models for the IGRAs had similar specificities and reductions in testing but had lower sensitivities and AUCs.
This is the first study to compare the effectiveness of targeted testing using either IGRA as an end point in any population, as well as the first to compare targeted testing as a predictive tool using IGRA and TST criterion standards. As in previous studies that used TST result as the outcome, birth in a TB-endemic country was found to be a strong predictor of a positive test result [7–9, 22]. Close contact with a TB case, foreign-born family members, and prior positive TST have also been associated with LTBI in previous studies [7–9, 23]. Other variables have also sometimes been associated with a positive TST result, including travel , smoking , male sex [5, 9], health care work , and education , but these were not found to be important predictors of LTBI in this study. Race and ethnic group did not contribute meaningfully as predictors after adjusting for other factors. The only study to assess use of a questionnaire to target testing in a similar heterogeneous adult population was among college students in Virginia . That study showed that using only the 2 variables of foreign birth and close contact with a patient with TB resulted in a sensitivity of 81.6% and a specificity of 91%. Although our 2-variable model had lower sensitivity than this, we found comparable sensitivity and specificity using a 4-variable model. Two studies in pediatric populations also found that using 4 or 5 questions to identify high-risk patients with LTBI resulted in similar sensitivities and specificities as those seen in this study [7, 8]. Prediction models of LTBI among contacts of active TB cases have had more modest reductions in testing, because of a higher pretest probability of infection and less concern about false-positive results than about false-negative results [5, 6].
This study has several important strengths. Despite other differences (such as age), the population was a good geographic representation of the underlying low prevalence, heterogeneous US source population (data not shown). Also, the 3 forms of TB testing allowed direct comparisons of the effectiveness of targeted testing to predict LTBI as measured by each test, which has not been done previously. There are also several limitations to this study; the most important is the lack of a gold standard in evaluating the presence of LTBI. The potential for false-positive TST results due to receipt of BCG, cross-reactivity to nontuberculous mycobacteria, and other factors is well known . The IGRAs are also known to have limitations in sensitivity and specificity , and it is uncertain whether the predictive capability of the IGRAs is better than that of the TST. The small number of positive test results also may have led to less power to detect small differences in the groups studied. Misclassification of exposures and outcomes was also possible in association with measurement error, although the outcomes were probably better controlled in this study than they would be in practice. Finally, this study is not expected to be generalizable to higher risk populations, including those with HIV infection or other immunosuppressive conditions, hospital workers, or prison guards.
An important implication of this study is that targeted testing of heterogeneous populations is feasible and effective using the TST or either IGRA. Validation of targeted testing had previously only been performed in a few populations, and it had only been done using the TST. In this study, targeted testing was seen to be less predictive of commercially available IGRAs than the TST but still had effectiveness comparable to universal testing. Although the IGRAs may be more specific tests, their use in low-prevalence populations will still result in predominantly false-positive results if testing is not targeted. Therefore, all testing of low-prevalence populations should be targeted, regardless of the choice of the diagnostic test used.
This study demonstrates that targeted testing using an RFQ is a useful strategy to test for LTBI and can be operationalized with acceptable performance characteristics using any of the commercially-available tests, consistent with CDC recommendations . Although, the RFQ in this study was better at predicting a positive TST result than for either IGRA, it does not demonstrate that the TST is superior for use in conjunction with targeted testing for several reasons. This study may have been somewhat biased by the use of CDC RSI, because the factors under evaluation were also correlated with a positive result of both the RFQ and the TST. Similarly, the use of a history of a prior positive TST result may bias the prediction model in favor of the TST, although it is noted that the RFQ still had superior sensitivity and specificity in predicting TST, compared with the IGRAs, even when discarding this as a risk factor. This is seen in the 3 variable models in Tables 3, ,4,4, and and5.5. It is concerning that 56% of positive IGRA results would be missed by the use of the RFQ as compared with 21% for the TST. However, the vast majority of these discordant positive results were positive for only 1 of the 3 tests and had no identifiable risk factors, suggesting that most were false-positive results. In addition, the known risk factors for TB had weaker associations with the IGRAs than with the TST, and no new risk factors were identified using the IGRAs. Finally, the RSI used for the TST increased the specificity of the test, decreasing the number of false-positive results. Because this is not currently done for the IGRAs, this may bias targeted testing against them in this type of evaluation. Therefore, the most likely explanation for the lower predictive ability in a low-prevalence population such as this is false-positive IGRA results. This suggests that risk-stratified interpretation of IGRAs, as is done for the TST, may be useful. It also suggests that IGRAs should not be used to replace targeted testing, because testing in low-prevalence populations will still result in false-positive results, even if the specificity is very high.
As with the TST, testing with IGRAs will result in false-positive results if IGRAs are used in low-prevalence populations. Regardless of the test used, targeted testing is critical in reducing unnecessary testing and treatment and is consistent with CDC guidelines . Targeted testing in this population would reduce testing by >90%, which would in turn reduce costs of the screening program and adverse events from therapy while still maintaining effectiveness. Some studies suggest that more than 50% of all positive results in low prevalence populations be false-positive results due to nontuberculous mycobacterium and other factors [3, 26, 27]. Targeted testing should therefore reduce treatment for people with false-positive results who derive no benefit from LTBI therapy but still incur the risk of adverse events.
Future studies suggested by this study include further analysis to improve targeted testing in US and other populations. Analysis to determine the magnitude and relative cost-effectiveness of targeted testing programs for the IGRAs versus the TST is also warranted. Prediction models in other populations may also be considered, including health care workers, prison guards, long-term travelers, and military service members deploying to TB-endemic countries [28, 29]. Finally, studies comparing the long-term rate of progression to active TB among TST- and IGRA-positive persons will allow a more accurate determination of LTBI status and risk of progression to active TB.
This study would not have been possible without financial and other support of the Uniformed Services University of the Health Sciences, the US Army Public Health Command, the Infectious Diseases Clinical Research Program, the medical and installation leadership at Fort Jackson, and the Soldiers who participated in the study.
This study was greatly assisted by the incredible energy and expertise of Ms Carey Schlett of the Infectious Disease Clinical Research Program. Her guidance and constant supervision were invaluable to the completion of the study.
We also thank Oxford Immunotec for performing the T-Spot test as an in-kind contribution.
We thank Dr Betty Johnson from Virginia Commonwealth University for sharing her TB risk assessment tool during the development phase of this study.
Laboratory and technical support was also provided by the US Air Force School of Aerospace Medicine and the Centers for Disease Control and Prevention’s Division of TB Elimination. The content of this publication is the sole responsibility of the authors and does not necessarily reflect the views or policies of the National Insitutes of Health or the Department of Health and Human Services, the Department of Defense, or the Departments of the Army, Navy, or Air Force. Mention of trade names, commercial products, or organizations does not imply endorsement by the US Government.
Financial support.This study (IDCRP-021) was supported by both the US Army Public Health Command and the Infectious Disease Clinical Research Program (IDCRP). The IDCRP is a Department of Defense program executed through the Uniformed Services University of the Health Sciences. This project has been funded, in whole or in part, with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, under Inter-Agency Agreement Y1-AI-5072.
Potential conflict of interest.All authors: No reported conflicts.
All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed in the Acknowledgments section.