In this investigation, incidence estimates based on cross-sectional data were affected by misclassification by the BED and Ax-AI assays. The unadjusted BED (6.5/100 PY) and Ax-AI (10.8/100 PY) incidence estimates were both substantially higher than our estimated prospective cohort 12-month incidence rate of 3.5/100 PY (95% CI: 1.6–5.4). Even after adjustment for poor specificity with CD4 data or sample-specific FRR, most assay-based estimates remained substantially higher than the overall estimated incidence rate for the prospective cohort. However, comparison with the incidence rate for the first 6 months of the cohort showed good correspondence with most STARHS-based estimates, particularly those for the combined BED/Ax-AI algorithm (both unadjusted and adjusted).
Comparisons between incidence estimates derived from the cross-sectional and prospective samples may be fraught for several reasons. First, by definition, cross-sectional and prospective incidence rates are estimated at different time points (over some time prior to baseline and during the months following baseline, respectively). Second, limited statistical power, as in the case of this analysis, will make meaningful comparisons between estimates difficult. Third, CD4- and FRR-adjustment strategies may not have fully corrected for misclassification in the cross-sectional sample, thereby leaving residual bias in the BED and Ax-AI incidence estimates. Fourth, selection bias may have caused the estimates to diverge, for example if there were differences in risk between women who enrolled in the prospective cohort and women in the survey sample who were eligible (i.e., HIV negative) but did not enroll. Indeed, cohort participants tended to be lower risk than non-enrolled women (data not shown). In addition, observation biases such as the Hawthorne effect or study-related risk-reduction interventions (e.g., condom provision, prevention counseling, STI treatment) may create artifactual differences in rates between prospective and cross-sectional samples
. We did observe a non-significant downward trend in incidence in the prospective cohort during follow-up, which could be due to the Hawthorne effect and/or some effect of study interventions. While specific reasons for the downward trend are difficult to isolate, the trend supports using early (first 6 months) rates from the cohort as the most appropriate comparator for STARHS-based estimates.
The BED-FRR in this sample was lower than BED-FRR reported for Zimbabwean
and North American
samples, but was higher than the rate reported for a rural South African sample
. Compositional, clinical (e.g., circulating HIV subtype or ART coverage), or biologic differences (e.g., disease progression) among the study populations could explain differences in the FRR. To our knowledge, this is the first publication of a false-recent rate for the Ax-AI method based on follow-up STARHS testing of HIV-positive survey participants. In this study, the FRR of the Ax-AI method was higher than the FRR for the BED assay. Although few studies have compared and contrasted results from the two assays, one study in Côte d'Ivoire did report poorer specificity on the Ax-AI as compared to the BED in prospective study seroconverter panels
. Poorer performance of the Ax-AI in this study, including the low correlation with BED, could be due to suboptimal cross-reactivity with a range of HIV-1 subtypes
Estimated mean window periods for the BED and Ax-AI assays among participants in the prospective sample were substantially longer than published window period values (330 vs. 155 days for BED, and 310 vs. 180 days for Ax-AI). Differences in mean window period may reflect underlying variability in the biologic response after infection with different HIV-1 subtypes
. Indeed, a recent analysis of data from multiple HIV seroconversion cohorts with varying HIV subtypes estimated the overall mean BED window period to be 197 days, with longer window periods for African vs. non-African cohorts (Parekh et al., submitted
). While the small sample size and lack of robust methods led to a high degree of uncertainty in our sample-specific window period estimates, they suggest potential improvement in assay performance with longer window periods—a finding that underscores the benefit of using a locally derived window period. For example, using the manufacturer's window period of 155 days for BED, the unadjusted incidence estimate is 13.9/100 PY (95% CI: 6.7–21.0), versus 6.5/100 PY (3.2, 9.9) with our estimated local window period of 330 days. An ideal assay would be applicable to all HIV-1 subtypes, as well as not rely on modifying existing commercial assays, be easy to transfer in the field, and be unaffected by changes in HIV antigen-specific antibodies associated with long-term infection.
This is also the first report of a false-recent rate for the combined BED/Ax-AI algorithm. The FRR for the combined BED/Ax-AI algorithm (among ART-naïve participants) was lower than the individual assay FRR, at only 2.1%. Indeed sequential testing with two STARHS assays is increasingly being recommended as a strategy for reducing misclassification and improving incidence estimates
. However, with two assays that perform sub-optimally in a given population there will be a trade-off between improved specificity and loss of sensitivity
. Availability of ART has rapidly increased in Rwanda during the past few years
. As individuals taking ART may be misclassified on STARHS assays because of changes in HIV antibody level due to treatment, misclassification rates in this population may increase over time as more individuals initiate treatment
. The ART status of survey participants, especially those testing recent on assays, should be measured systematically (e.g., therapeutic drug monitoring (TDM), chart review, self-report) so that individuals taking ART can be excluded from incidence analyses and FRR calculations
Several factors were associated with testing false-recent on the assays among HIV-positive participants with known LTI, including having been classified as RI by the assays at baseline; more frequent history of HIV testing; and older age (borderline significant association). HIV testing history and older age were significantly positively associated with long-term HIV infection in this sample (data not shown). The association between testing false-recent and having a prior STARHS classification of RI may reflect the presence of “assay non-progressors” in this sample, or individuals who are repeatedly classified as RI by STARHS assays over time because of sustained low antibody levels
. Further, false-recent classification by Ax-AI could be due, in part, to infection with multiple HIV clades, wherein subsequent waves of antibody production maintain low antibody avidity
. Our observation of a higher Ax-AI-false recent rate among participants with an HIV-positive partner, frequent HIV testing history, and higher baseline CD4 count, support such a hypothesis.
In this population, adjustment of STARHS-based incidence estimates with FRR brought estimates closer to the gold standard estimated cohort incidence rate than did adjustment using a CD4 cutoff of <200 CD4 cells/µl for probable LTI. Incidence surveys should use a locally derived, population-specific FRR versus a published rate from a different population, and indeed should be reconsidered when a local FRR is not available. While follow-up of a long-term infection cohort such as was done in this study is the optimal method for estimating an FRR, false-recent rates can also be estimated in sufficiently large cross-sectional samples of ART-naïve individuals with long-term infection. In our study, CD4 adjustment also appeared to help reduce potential inflation of estimates, which underscores the value of CD4 data for adjusting and interpreting STARHS results, and thus the importance of incorporating CD4 count measurement into national or population-based serosurveys using STARHS to estimate HIV incidence if feasible. CD4 testing may be feasible, for example, in settings with enhanced clinical and laboratory capacity as a result of treatment scale-up. Ideally, assay FRR and CD4 count data, along with other clinical information, would be available for adjusting STARHS-based incidence estimates.
This study has several strengths. The combined cross-sectional and prospective design enabled us to compare incidence estimates, derive population-specific FRR on the assays, including for the combined test algorithm, and estimate assay window periods from serial specimens from individuals with known interval of HIV seroconversion. The use of two STARHS assays contributes important information about the assays' independent and relative performance in a high-risk setting with little experience with STARHS. Discussion is ongoing regarding the optimal assay parameters (e.g., window periods, cutoff values, including use of a “grey zone” instead of a single value) for a combined BED/Ax-AI algorithm.
Study limitations are also noted. The small sample size of the study, and relatively few HIV seroconversions and recent infection classifications, may have limited statistical power for certain analyses. Specifically, the small sample size, along with other study design features, prohibited the use of more robust statistical methods for comparing the cross-sectional and prospective incidence rates, such as equivalence tests
, and may have also led to reduced precision around the assay FRR estimates
. However, the statistical approach employed for incidence estimation does attempt to quantify the affect of uncertainty in the calibrating parameters (e.g., FRR and window period) on the incidence estimates. Furthermore, our approach to CD4-adjustment of STARHS classifications (using a cutoff of <200µl for LTI) may have erroneously excluded individuals with primary HIV infection and low CD4 count
from the RI classification. However, the lower limit of 287 CD4 cells/µl among recent seroconverters in this sample suggests that using a cutoff of CD4<200 for adjustment would not result in the loss of many individuals with true RI status in incidence estimates (indeed there were no individuals with CD4<50 and RI status on the assays). Additionally, participants' ART status was assessed by self-report rather than by pharmacokinetic testing. However, women in the baseline survey were newly diagnosed with HIV by the study and so were assumed to be ART-naïve, and even at follow-up few women would have begun taking ART given the relatively short time since diagnosis. Finally, although some studies have shown that the Ax-AI method may be more specific than the BED assay
, the ideal dual testing algorithm would include a confirmatory test with perfect specificity.
In this sample of Rwandan FSW, adjusted incidence estimates based on a combined BED/Ax-AI algorithm were similar to the estimated HIV incidence rate in the first 6 months of cohort follow-up, when incidence was highest. Furthermore, false-recent rate on the combined BED/Ax-AI algorithm was low, and substantially lower than for either assay alone. In population-based testing, specificity of the BED and Ax-AI assays, and the combined test algorithm, would be expected to be substantially higher given that a larger proportion of individuals will have longer-term HIV infection.