|Home | About | Journals | Submit | Contact Us | Français|
Rationale: There is uncertainty regarding how to interpret discordance between tests for latent tuberculosis infection.
Objectives: The objective of this study was to assess discordance between commercially available tests for latent tuberculosis in a low-prevalence population, including the impact of nontuberculous mycobacteria.
Methods: This was a cross-sectional comparison study among 2,017 military recruits at Fort Jackson, South Carolina, from April to June 2009. Several tests were performed simultaneously with a risk factor questionnaire, including (1) QuantiFERON-TB Gold In-Tube test, (2) T-SPOT.TB test, (3) tuberculin skin test, and (4) Battey skin test using purified protein derivative from the Battey bacillus.
Measurements and Main Results: In this low-prevalence population, the specificities of the three commercially available diagnostic tests were not significantly different. Of the 88 subjects with a positive test, only 10 (11.4%) were positive to all three tests; 20 (22.7%) were positive to at least two tests. Bacille Calmette-Guérin vaccination, tuberculosis prevalence in country of birth, and Battey skin test reaction size were associated with tuberculin skin test–positive, IFN-γ release assay–negative test discordance. Increasing agreement between the three tests was associated with epidemiologic criteria indicating risk of infection and with quantitative test results.
Conclusions: For most positive results the three tests identified different people, suggesting that in low-prevalence populations most discordant results are caused by false-positives. False-positive tuberculin skin test reactions associated with reactivity to nontuberculous mycobacteria and bacille Calmette-Guérin vaccination may account for a proportion of test discordance observed.
There is substantial discordance between the tuberculin skin test and IFN-γ release assays in populations with low prevalence of tuberculosis, and most positive results from the three tests identify different people.
This study suggests that most positives from any of these tests are false-positives in low-prevalence populations. To support the current recommendations to treat tuberculosis, targeted testing using risk-stratified interpretation should be used for the IFN-γ release assays as with the tuberculin skin test.
There is continued uncertainty as to which diagnostic test for latent tuberculosis infection (LTBI) is most accurate in the United States population: the tuberculin skin test (TST) or IFN-γ release assays (IGRAs), including the QuantiFERON-TB Gold In-Tube test (QFT-GIT) and T-SPOT.TB test (T-Spot). There is no gold diagnostic standard for evaluating the performance of the IGRAs compared with the TST other than the long-term progression to active TB in cohort studies (1). In the absence of a gold standard, IGRAs are routinely compared in practice with the TST in cross-sectional evaluation studies, using active TB cases to assess sensitivity and low-risk populations to assess specificity (2, 3). In these studies, significant discordance is often found between IGRA and TST results. In a study of Navy recruits, 11 (73%) of 15 of the highest-risk individuals (whose country of birth had a rate of active TB of >100 per 100,000 person-years and who had TST reactions of at least 15 mm) had negative QFT-Gold tests (4). There are several explanations for these discordant results, including the use of region of difference one antigens in the IGRAs, which might result in greater specificity. However, it is also possible that the TST may have greater sensitivity, that the IGRAs may detect only unresolved or more recent infections (5), or that TST and IGRAs provide complementary measures of immune response (6).
Nontuberculous mycobacteria (NTM) may be an important potential source of false-positive tests for Mycobacterium tuberculosis infection in areas where the likelihood of infection is very low (7), such as the southeastern United States (8). The late Dr. George Comstock remarked in 1975 that “the frequency of cross-reactions to tuberculin in this [Navy recruit] population is sufficiently great that the prevalence of true tuberculous infections among white recruits may already be approaching zero” (9). The prevalence of sensitization to NTM in the United States population increased from 11% in 1972 to 17% in 2000 (10). Military recruits are an excellent population to explore NTM sensitization as a potential source of TST/IGRA discordance, because bacille Calmette-Guérin (BCG) and waning sensitivity to TST because of age are uncommon and recruits originate from a wide geographic area.
The impact of cross-reactivity on TST results has been previously investigated by comparing results of skin tests performed with purified protein derivative (PPD) made from M. tuberculosis (PPD-Seibert) and several NTM, including Mycobacterium intracellulare. PPD-Battey (PPD-B) is a skin test antigen made from the Boone strain of M. intracellulare in a manner similar to how PPD-Seibert is made from M. tuberculosis. A skin test performed with PPD-B is referred to as a “Battey skin test” (BST). The BST has been used as an aid in the differentiation of reactivity to M. tuberculosis from reactivity to NTM in Navy recruit (8, 11) and National Health and Nutrition Examination Survey studies (10, 12, 13). It has also been used in many other smaller epidemiologic studies (14–19). The objectives of this study were to compare commercially available tests for LTBI in a heterogeneous, low-LTBI prevalence United States population and to assess the impact of NTM reactivity on test discordance.
After providing written informed consent, recruits originating from all areas of the United States, age 18 years or older, undergoing routine entry-level medical processing at Fort Jackson, South Carolina, were screened for participation in the study. Recruits were excluded from participating if they (1) had a history of severe reaction to the TST, (2) were pregnant by urine human chorionic gonadotropin testing, (3) had received a live virus vaccine within the past 30 days, or (4) had a major viral infection at the time of screening.
PPD-B was used as a skin test antigen under an Investigational New Drug Protocol sponsored by the Uniformed Services University in Bethesda, Maryland. The Infectious Diseases Institutional Review Board at Uniformed Services University provided approval and oversight of the study.
This cross-sectional comparison study among Army recruits at Fort Jackson consisted of five elements: (1) a TB risk factor questionnaire, (2) T-Spot, (3) QFT-GIT, (4) BST, and (5) TST.
The TB risk factor questionnaire contained questions about demographics, TB exposure, work history, location of residence, and other factors shown in Table 1. This questionnaire was developed from the risk factors previously identified in the military and nonmilitary literature (20–25), and other factors considered candidates for causal relationships with LTBI.
Blood for QFT-GIT and T-Spot was collected at the time of routine phlebotomy for recruit in-processing. Personnel performing IGRAs were masked to all patient data. QFT-GIT was performed according to package insert instructions, including incubation and centrifugation of blood within the prescribed times at Fort Jackson, and completion of ELISAs at the US Air Force School of Aerospace Medicine, Brooks City-Base, Texas, and the Centers for Disease Control and Prevention (CDC), Atlanta, Georgia (26). ELISAs were performed with the aid of Triturus automated ELISA workstations (Grifols USA, Los Angeles, CA). T-Spot was performed per package insert instructions (27) at the Oxford Immunotec, Ltd. Laboratory, Marlborough, Massachusetts, with the addition of T cell Xtend (Oxford Immunotec, Ltd., Oxfordshire, UK) immediately before peripheral blood mononuclear cell recovery. IGRAs were interpreted according to published guidelines (28); however, in the analysis of quantitative responses, borderline T-Spot results (i.e., TB response of five, six, or seven spots) were coded as “negative.”
TST and BST were placed by study personnel after the blood draw. All personnel involved in placement and reading of the skin test were trained and monitored to strictly adhere to standard operating procedures based on published methods for skin test administration and interpretation (20, 29). The Mantoux technique was used to intradermally administer 0.1 ml (5 TU) of Tubersol tuberculin PPD (Sanofi Pasteur Ltd., Toronto, ON, Canada) and 0.1 ml (0.01 μg) of PPD-B at the same sitting. One skin test was placed on each forearm. A random number table for each recruitment day determined which PPD was placed on each arm. The transverse diameter of induration at each skin test site was measured 2 days after PPD injection. Participants and those administering and reading the skin tests were masked to which skin test antigen was administered on each arm.
Recruits were categorized using a risk stratified interpretation (RSI), as previously described by the CDC (30). The only modifications to the CDC criteria were that no time limitations were placed on contact with an active TB case or immigration from a high-prevalence country. The TB prevalence reported by the World Health Organization in 1990 was used to estimate exposure risk by country using groups of (1) less than 20 per 100,000, (2) 20–100 per 100,000, and (3) greater than 100 per 100,000 (4, 31). BCG status was determined by self report. There was a strong correlation between reported history of BCG vaccination, presence of BCG scar, and foreign birth in this population. There was no significant difference in the results when using history of BCG vaccination or BCG scar (data not shown). Test specificity was estimated by assuming that recruits with no risk factors for M. tuberculosis exposure were uninfected. An invalid test was defined as those with insufficient blood, misplaced or dislodged caps, an insufficient number of peripheral blood mononuclear cells recovered, or other laboratory errors. Test discordance was categorized as “TST positive/IGRA negative” or “TST negative/IGRA positive” for the QFT-GIT and the T-Spot. BST induration size was categorized into four 5-mm intervals and one greater than or equal to 20 mm. A dominant BST reaction was defined as a BST reaction of at least 2 mm greater than the TST reaction.
The proportion of recruits with a positive TST, T-Spot, and QFT-GIT were compared using McNemar test for correlated proportions, as were specificity and the proportion of indeterminate and invalid results for each test. The proportions of discordant and concordant results were also measured, and test agreement using kappa (κ) coefficient. Factors associated with discordance were evaluated using standard chi-square bivariate statistics, stratified analyses, and multivariate analysis. Prevalence ratios were directly estimated for both bivariate and multivariate analyses. Because the log-binomial model failed to converge because of numerical instability, Poisson regression with robust variance estimation was used to calculate multivariate prevalence ratios (32). The variables evaluated are listed in Table 1.
Discordance between TST and IGRA was further assessed using associations between demographic and exposure variables including category of BST induration. TST positive/IGRA negative discordance was assessed separately from TST negative/IGRA positive discordance. The comparison group used for both of these analyses was the group of concordant negatives.
Figure 1 depicts subject participation and follow-up in a flow chart. Of the 3,095 recruits approached from April 1 to June 11, 2009, a total of 2,697 were eligible to participate in the study, of which 2,017 subjects (75%) enrolled. Of the 39 recruits who withdrew before blood collection or completion of skin testing, 30 were for administrative reasons unrelated to the study. Characteristics of the remaining 1,978 study participants are shown in Table 1. TST results were available for all of the remaining 1,978 participants, and were read a mean of 45 hours after PPD injection (range, 40–50 h). TST induration was detected in 122 (6.2%) participants and ranged from 2–80 mm. No significant digit preference was identified on inspection of the histogram of reaction size (see online supplement). T-Spot and QFT-GIT results were available for 1,913 (96.7%) and 1,850 (93.5%), respectively. QFT-GIT was invalid for 128 (6.5%) subjects, and 17 (0.9%) of the valid QFT-GIT gave indeterminate results. T-Spot was invalid for 65 (3.3%) subjects, 6 (0.3%) of the valid T-Spots were indeterminate, and 23 (1.2%) had borderline results with a TB response between five and seven spots. The relatively high proportion of subjects with invalid tests was caused by a need for numerous tubes of blood for routine recruit inprocessing and investigational tests, and an institutional review board restriction against additional phlebotomy solely to collect blood for investigational tests.
Of the 1,803 subjects who had valid positive, negative, or borderline results for all three tests, 1,373 were classified as low-risk for M. tuberculosis infection based on history, but 19 of them had borderline T-Spot results. Among the 1,354 recruits without identifiable risks and with determinate results for all three tests, estimates of TST specificity were 99.3% (95% confidence interval [CI], 98.7–99.7) when using the 15-mm cutoff for positive recommended by the CDC for persons at low risk of exposure (30), or 98.6% (95% CI, 97.8–99.2) when using a 10-mm cutoff. The specificity of the IGRAs was 98.7% for the T-Spot (1,336 negatives among 1,354 low-risk recruits; 95% CI, 97.9–99.2), and 98.8% for the QFT-GIT (1,338 negatives among 1,354 low-risk recruits; 95% CI, 98.1–99.3). Estimates of specificity were unchanged when borderline T-Spot results were coded as negative and included in the analysis (data not shown). None of the differences were statistically significant.
There were 1,781 subjects who had valid positive or negative results, excluding subjects with indeterminate or borderline results by any test. Table 2 shows the number and proportion of positive tests by test type, and the prevalence of BST reactions among the positives. An analysis of risk factors for positive tests, such as BCG vaccination and foreign birth, is presented in another recent publication (33). The proportion of subjects with a 10-mm or greater TST reaction was significantly larger than with any other test or TST cutoff (P < 0.05), and the proportion of subjects with a 15-mm or greater TST reaction was significantly smaller than that found by RSI or a 10-mm cutoff (P < 0.0001). None of the other differences in proportions was statistically significant. A total of 19 (33%) of 57 recruits with 10 mm or greater TST reactions did not have identifiable risks for M. tuberculosis infection. When using RSI as suggested by the CDC (30), 2.7% were positive, a similar proportion of positive results as was observed for both the T-Spot (1.9%) and QFT-GIT (2%).
Using the RSI for TST, 88 (4.9%) had a positive result to at least one of the three tests. Of these, only 10 (11.4%) were positive to all three tests; 20 (22.7%) were positive to at least two of the tests. Modest agreement between TST and the two IGRAs was seen in Tables 3–5.. In contrast, good agreement was seen with TST when using different blinded readers (kappa = 0.79; see online supplement).
Of the 48 subjects with a positive TST, 9 (18.8%) had a dominant BST reaction, defined as a BST reaction of at least 2-mm greater than the TST, as shown in Table 2. Table 6 further examines the associations of potential risk factors for TST-positive, IGRA-negative discordance. Strong dose–response relationships were observed between discordance and BST reaction size, TB prevalence in country of birth, and BCG vaccination. No significant associations were seen between any variables and IGRA-positive/TST-negative discordance or T-Spot/QFT-GIT discordance (data not shown).
Among the 1,803 subjects with valid tests and determinate results, Table 7 shows the agreement between the three tests by quantitative result of each test. Subjects with borderline T-Spot results were included in this analysis to assess a continuum of TB responses including five to seven spots. This shows an association of increased proportion of greater quantitative test results with increased concordance between the tests. This dose–response relationship was highly significant for all three tests. Table 8 shows the quantitative test results for each test according to risk strata. The association of increasing risk for infection with M. tuberculosis with increasing proportion of IGRA response suggests a similar relationship between the quantitative test results of the IGRAs as is seen with the TST. The dose–response relationship between risk of infection with M. tuberculosis and quantitative test result was also highly significant for each test. Similarly, Table 9 shows the association of higher TB risk strata with greater test concordance; this dose–response relationship was also statistically significant.
This study suggests that the three commercially available TB diagnostics have similar results in United States populations with low TB prevalence. IGRAs were designed to increase specificity, but in this study specificity for the IGRAs was no better than TST specificity among low-risk recruits when interpreted using a TST cutoff of 15 mm according to published guidelines. The prevalence of positive results and dose–response relationships with TB exposure were also similar for the three tests. Despite these areas of agreement, the three tests identified different people for most positive test results. In this trial, TST-positive, IGRA-negative discordance was strongly associated with BST results, supporting other evidence that NTM sensitization can cause false-positive TST results. Conversely, the IGRAs showed little evidence of cross-reactivity to NTM by the BST. Although this suggests that NTM and BCG sensitization cause false-positive TST results and that this contributes to discordance, these factors do not explain the etiology of most of the discordance encountered.
Other aspects of test discordance examined in this study include the dose–response associations seen between the TB exposure risk, quantitative results of the TST and IGRA testing, and degree of concordance between the three tests. These data suggest that in low-prevalence populations, most positives resulting from any of the three commercially available diagnostic tests are false-positives because (1) 77% of subjects with positive test results were positive by only one test, (2) lower quantitative results were associated with smaller risk for TB exposure, (3) lower quantitative results were associated with single positive tests, and (4) lower risk for TB exposure was associated with decreasing test agreement.
The problem of low positive predictive value is well known and understood with the TST (34). Use of risk stratification is currently recommended to guide the interpretation of the TST as a way to increase positive predictive value and reduce false positivity (30); this is not used for the IGRAs. This study suggests that performance of the IGRAs would also benefit from the use of a risk-stratified interpretation, because it would increase positive predictive value and reduce the number of false-positives. These findings support the CDC's recommendation that people at minimal risk of infection (who are at greatest risk of a false-positive result) should not be targeted for LTBI testing, regardless of whether a TST or IGRA is used (35).
This study provides reliable estimates of specificity in a low-risk population. Although both IGRAs are generally reported to have specificity higher than the TST (2), there was surprisingly little difference in specificity between TST and either IGRA seen in this study. The specificity estimates for TST and IGRA found in this study are similar to those found in previous studies of Navy recruits (4). Although the specificity of QFT-GIT is sometimes thought to be higher than that of T-Spot (2, 3), the estimated specificities of the two tests were not different in this study. The strong dose–response relationships between TB exposure and positive TST and IGRA results were also similar to those reported previously (2, 3). These findings further support the CDC's recommendation that IGRAs may be used in place of the TST, but that testing should be targeted to avoid false-positive results (35).
Although IGRAs and TST may be used in the diagnosis of LTBI, they do not give equivalent information and often have discordant results. Several studies have compared results from different IGRAs and from TST “head-to-head” (28, 36–42), and although the agreement between QFT-GIT and T-Spot has generally been very good, discordant results between the IGRA and TST have been found in up to 20–30% of subjects (3). The magnitude of discordance is demonstrated in this study by the low kappa values and the high proportion of discordance seen among positives, because 68 (77%) of 88 individuals with at least one positive test were positive to only one of the three tests. The frequency of test discordance has varied among studies, leading some authors to conclude that the IGRAs have lower sensitivity (36), whereas others have concluded that the IGRAs have better specificity because of less cross-reactivity with BCG vaccine and to waning sensitivity because of age (28). The differences may also be caused by differences in the populations studied.
A few studies have provided evidence that NTM contribute to discordance between the TST and IGRA (4, 39), but none have used the BST. In this study, the strong dose–response relationship between increasing BST reaction size and increasing prevalence of discordance provide additional evidence that false-positive TSTs contribute to this discordance. BCG vaccination was also strongly associated with discordance in this study. However, risk-stratified TST-positive, IGRA-negative discordance was also associated with TB prevalence in country of birth and being Asian or from the Pacific Islands, traditionally factors associated with high risk of developing disease if infected. Thus, some of the discordance also may be attributable to lower sensitivity of the IGRAs compared with TST, or a combination of these two factors.
A limitation of this study is the lack of a gold standard for determining the presence of M. tuberculosis infection, making it difficult to assess the true significance of discordance between TST and IGRAs. The significance of reactivity to BST also has some uncertainty. Although it has previously been shown to assist in differentiating between LTBI and cross-reactivity caused by NTM (8, 11), BST reactivity also may be caused by cross-reactivity after M. tuberculosis infection (16, 43). Furthermore, there are other mycobacteria that contain region of difference one antigens, such as M. kansasii, M. szulgai, or M. marinum; infection with these NTM may cause false-positive reactions to TST and IGRAs (2, 44). There is potential for misclassification of several variables, including the recall of BCG vaccination among recruits, history of prior TB or LTBI diagnosis or treatment, and contact with a TB case. Although samples were sent masked to all participating laboratories, the potential still exists for other residual sources of misclassification bias. Recruits are a low-risk population and may not represent the causes of test discordance in other higher-risk populations. Furthermore, because this research was performed in the high-throughput basic training setting, the administrative limitations imposed resulted in larger proportions of inadequate blood draws and TST reading times, which were slightly shorter than optimal.
This study highlights the need for better understanding of the significance of test discordance, particularly the need for longitudinal data on progression to active TB among those with discordant test results. Applying the methodology used in this study to other populations (11, 12) may provide a more complete understanding of the test interpretation and test discordance. Finally, further research is needed to better characterize the most appropriate cutoffs to be used for the risk-stratified interpretation of the IGRAs, to maximize sensitivity and specificity in different risk groups and populations.
The authors thank Christine Anderson, Ph.D., (Food and Drug Administration) for graciously supplying her expertise in preparing the Battey antigen and testing it for human use. She also provided valuable comments on the manuscript during preparation. This study was greatly assisted by the incredible energy and expertise of Ms. Carey Schlett of the Infectious Disease Clinical Research Program. Her guidance and constant supervision were invaluable to the completion of the study. The authors also thank Dr. Richard Menzies, who provided invaluable advice and expertise in designing and setting up this study.
Supported by the Infectious Disease Clinical Research Program, a Department of Defense program executed through the Uniformed Services University of the Health Sciences. This project has been funded in whole, or in part, with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), under Inter-Agency Agreement Y1-AI-5072. The content of this publication is the sole responsibility of the authors and does not necessarily reflect the views or policies of the NIH, the Department of Health and Human Services, the Centers for Disease Control and Prevention, the Department of Defense, or the Departments of the Army, Navy, or Air Force. Mention of trade names, commercial products, or organizations does not imply endorsement by the US Government. The Infectious Disease Clinical Research Program (Bethesda, MD) participated in all phases of the study, including design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. Oxford Immunotec (Marlborough, MA) performed T-Spot testing (masked) as an in-kind contribution, but played no other role in the design, conduct, collection, management, analysis, interpretation of the data, preparation, review, or approval of the manuscript. Laboratory and technical support was also provided by the US Air Force School of Aerospace Medicine and the Centers for Disease Control and Prevention's Division of TB Elimination.
Author Contributions: J.D.M., D.T., G.H.M., C.O., N.E.A., L.G., D.G., and L.W.K. all had substantial participation in conception and design of the study, acquisition or analysis of data, interpretation of the data, and revision of the article.
This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org
Originally Published in Press as DOI: 10.1164/rccm.201107-1244OC on December 8, 2011