Two panels were established for HIV-1 detection, a sensitivity/specificity panel consisting of HIV-1-positive and -negative plasma samples and a seroconversion panel consisting of serial blood samples from patients with acute HIV-1 infection. Negative plasma samples either were negative by all HIV assays or, if positive by any test, were negative by Western blotting. Designated positive plasma samples were all positive by Western blotting. In the seroconversion panel, all patients ultimately became Western blot positive. Thus, the panels are defined by reference to a unique composite standard that relies in large part on Western blot results for definitive designation of specimen status. The Western blot has traditionally been the gold standard against which HIV assays are compared. It should be noted, however, that false-positive Western blots have been reported (7
) and that when comparing tests to a gold standard, the best a test can do is match the Western blot results. If the test is actually better or if the Western blot is incorrect in some instances, a more accurate test would look worse.
Another caution relates to the statistical analyses. Most differences in test or algorithm performance were small and incremental, as expected for tests with excellent individual performance. In general, differences in sensitivity or specificity of greater than 1.3% or 1.4% were statistically significant differences, but these data were not corrected for multiple comparisons. This study was not intended to be a comparison of test performance in a particular algorithm. Rather, we compared algorithm strategies using current test combinations to assess the relative advantages and magnitude of the differences between algorithm strategies.
Three multitest algorithm strategies were evaluated. These can be viewed as specificity-optimized, sensitivity-optimized, and tie-breaking algorithms. The two dual-test algorithms differ in the interpretation of discordant test results. If both tests are required to be concordantly positive to be scored as positive (discordance equals a negative result), specificity is optimized (Table ). This strategy is in principle similar to the current U.S. PHS-recommended algorithm with the proviso that discordant specimens may be reported as indeterminate or negative (4
). Conversely, if discordants are scored as positive, sensitivity is optimized (Table ). Overall, the average differences in sensitivity and specificity between the two dual-test algorithms for all combinations tested were 2.7% and 1.8%, respectively (1.1% and 1.4% for the serologic tests only). The specificity-optimized algorithm could be performed sequentially (the second test is done only if the first test is positive). In the sensitivity-optimized algorithm, both tests would have to be run on all specimens. The three-test algorithm represents a favorable compromise between the two dual-test algorithms with an improvement in sensitivity greater than the loss of specificity relative to the specificity-optimized dual-test algorithm and, conversely, an improvement in specificity greater than the loss of sensitivity relative to the sensitivity-optimized dual-test algorithm (Table ).
The evaluation of performance in Western blot-defined HIV-1-positive and -negative panels omits the evaluation of Western blot-indeterminate specimens, because without follow-up, the infection status of these specimens is unknown (or undefined). The causes of indeterminate patterns on Western blot include technical artifact, laboratory error, irrelevant cross-reactions, nonspecific binding, infection with a related retrovirus such as HIV-2, the presence of an antigenic variant of HIV-1, early HIV-1 infection, or late-stage disease (3
). Antiretroviral therapy reduces viral load and has been reported to reduce HIV-specific antibody (11
). Presumably, patients on antiretroviral therapy would be diagnosed before the initiation of therapy. There was no indication that the current tests were less sensitive for infection with non-B subtypes or variants. The likelihood that an indeterminate blot reflects true infection rises with higher HIV prevalence in the test population. In low-prevalence settings, most indeterminate blots are not from infected people (3
). Of the 41 indeterminate blots encountered in this study, four were positive in the majority of the serologic assays and in all three NAAT. They would have registered positive or at least been flagged as discordant by most of the alternative algorithms. The remaining 37 indeterminates were negative in most of the serologic tests and in all three NAAT. They would have registered as concordantly negative by most test combinations and thus been subsumed into definitive negative results by most of the algorithms. This is consistent with the expectation that most Western blot-indeterminate specimens do not represent bona fide infection. However, without confirmed designation of infection status by follow-up, it is conjecture to evaluate the performance of tests on these specimens. As an alternative, we assembled panels of sera that frequently register an indeterminate blot result and are from infected people, i.e., specimens collected serially from patients with newly acquired HIV-1 infection and specimens from patients with HIV-2 infection. We do not have the converse type panel: indeterminate blots from people known by follow-up not to be infected. This type of panel is difficult to assemble. It requires follow-up and would be biased by the test or tests used for initial screening.
In the evaluation of the seroconversion panel, the sequence and intervals of seroconversion are consistent with other studies (5
). Since the sample size was small (183 specimens from 15 donors), the intervals between test reactivities may not be precise. However, since all tests were run on all specimens, the ranking or sequence of reactivity is comparable between tests. The Western blot was indeterminate about 9 days before it was positive (Fig. ). Most of the tests were already positive or became positive during this time, but there were differences in the analytic sensitivity as measured by comparison of how early the tests became reactive. The NAAT, which in the panel of established infection was less sensitive than the serologic tests, was the most sensitive for early infection, reflecting the fact that viral replication precedes seroconversion. The third-generation EIAs were decidedly more sensitive than the earlier-generation EIAs and than two of the four rapid tests. For tests that are positive before the Western blot becomes indeterminate or positive, algorithms employing these tests would have a substantial advantage over the conventional Western blot-based algorithm in diagnosing early infection or flagging it for further testing. The overall effect on HIV detection in a given diagnostic setting would depend on how many specimens in the test population were from early infection. Higher-prevalence, emerging-epidemic, and higher-risk settings are likely to have more. Some indication of the relative numbers of early infection and established infection can be gleaned from surveys for primary HIV infection. In these studies, the population is screened by EIA, and EIA-negative specimens are screened by NAAT (2
). The incremental yield of HIV-1 infections detected over that of serology alone ranges from 0 to 11% (33
The potential role of NAAT in a diagnostic algorithm reflects the unique features of this technology. Ideally, screening and supplementary tests should be orthogonal; that is, they should differ sufficiently in format or content such that they are not prone to the same false-positive or false-negative effects. NAAT is an appealing addition in that it detects virus directly and uses technology that does not share features with the antibody tests. However, it is less sensitive for detection of established infection than the serologic tests (Table ) (1
). On the other hand, it is more sensitive than the serologic tests for early infection (Fig. ) (1
). Our data support the use of NAAT as a supplementary test for confirming antibody-positive sera and as a screen of antibody-negative sera for primary infection. To date, NAAT diagnostics has been reserved for niche applications where antibody is not present (detection of primary HIV infection) or the presence of HIV antibody is uninformative (HIV diagnosis in infants). U.S. blood banks use NAAT to screen EIA-negative specimens for primary HIV infection. They also may use NAAT testing in their standard algorithm for evaluation of EIA-reactive screening tests (2
). If the NAAT is positive on the EIA-reactive specimen, infection is confirmed without the need for a Western blot. If NAAT is negative, the specimen is considered unresolved and undergoes further testing by the conventional algorithm (i.e., Western blotting). This is analogous to the three-test algorithm, where discordant results on the first two tests are resolved by a third test (Table , line 8 or 9). The data in Table indicate that an EIA tie-breaker would function as well as the Western blot and not as well if discordants (EIA positive, NAAT negative) were registered as negative (Table , compare lines 8 and 9, specificity-optimized dual-test algorithm with the three-test algorithm).
Regarding HIV-2 detection, all the tests that have an HIV-2 designation registered 34 of 34 HIV-2 specimens positive. For the other tests, the detection rate was variable (16 to 34 of 34). The NAAT do not detect HIV-2. In the current U.S. PHS algorithm, specific testing for HIV-2 is prompted by an indeterminate HIV-1 Western blot result for an EIA-reactive specimen or by clinical suspicion (specimens from symptomatic or exposed patients with links to West Africa) (9
). This is not an entirely satisfactory process. Specimens from dually infected people would be reported as HIV-1 positive, and, rarely, HIV-2 specimens that are HIV-1 Western blot negative rather than indeterminate do occur. Combination HIV-1/2 tests target both HIV-1 and HIV-2, have high sensitivity for HIV-2, and do not rely on cross-reactivity with HIV-1. Thus, HIV-2 specimens would be expected to cotrack with HIV-1 specimens in most of the algorithms presented here without a pattern that could be used as a flag for HIV-2 testing. If HIV-2 is a concern, all specimens that register positive in an alternative algorithm would have to be tested for HIV-2 with a discriminatory test such as Multispot. This may actually require less overall HIV-2 testing than is done in the conventional algorithm, where indeterminate Western blots are tested. This is because in low-prevalence settings, indeterminate blots generally greatly outnumber positive blots (3
Given the anticipated prevalence of established HIV-1 infection, early HIV-1 infection, and HIV-2 infection and the sensitivities/specificities, the data presented here may be used to project the algorithm accuracy, the number of tests required, and the cost of the respective algorithms for a given diagnostic setting. There is no recommended standard for acceptable algorithm performance. The FDA draft guidance for manufacturers seeking licensure of individual tests recommends demonstration that the lower bound of the one-sided 95% confidence interval for sensitivity and specificity exceed 98%. For the sample size in our panel, this would require that the measured sensitivity and specificity exceed 98.6% and 98.8%, respectively. The test combinations exceeded this in at least one of the algorithm strategies, and most exceeded it in all strategies. As an alternative to minimum acceptable criteria, comparative algorithm performance could be used for the selection of appropriate diagnostic procedures. In general, for any given test combination, the three-test algorithm results in the highest net combination of sensitivity/specificity. Conversely, for any given algorithm, test combinations that include third-generation EIAs result in the highest sensitivity/specificity (Tables to ). However, a number of other test combinations or algorithms have performance that is not significantly inferior. Thus, from the standpoint of minimum performance criteria or of relative performance, these data support the implementation of alternate algorithms that do not include the Western blot, that result in less-ambiguous testing (discordants or indeterminates), that cost less, and that can accommodate special features or a testing program such as on-site (outreach) testing and screening for acute HIV infection.