The National HIV surveillance system has provided invaluable information for the monitoring of the US epidemic since its inception, including insights on the changing demographics and transmission categories, changes in prevalence and mortality, and estimation of HIV incidence [
12,
23,
24]. A study in the San Francisco DOH found that monitoring of VL and CD4 count evaluations from laboratory reporting increased detection of initial primary care visits by 25% above standard HIV public health investigation [
19], indicating that complete reporting of laboratory data to the DOH offers increasingly complete capture for monitoring care visits in known HIV cases. HIV surveillance data have also been used to assess the size of the diagnosed HIV-infected populations currently living in defined jurisdictions, and to map ecologic trends in newly diagnosed infections and transmission potential in a community based on the measured and imputed aggregate mean VL of the HIV-infected population [
25,
26]. In addition, increasing confidence in the surveillance data has led the CDC to support local health departments to initiate programs to increase HIV testing and strengthen linkage-to-care and treatment.
Conducting site-randomized trials is a complex undertaking, requiring careful discernment to define the cohort to be assessed and to establish data systems for measuring site-level outcomes [
27]. Surveillance data systems can provide comprehensive and accurate estimates of site-level measures and key epidemic indicators for cases that make such trials more operationally feasible and potentially less expensive, especially if utilizing the complete capture of laboratory-based state and local public health surveillance systems.
Surveillance data can only be used to evaluate public health interventions in which study outcomes are aggregated across patients because no individual patient data that originate in surveillance can be used for patient-level research. Measurement of an intervention effect requires outcomes that are reliably captured within the surveillance population (e.g., laboratory measures and events at diagnosis in known cases of HIV infection) and can be aggregated across the chosen site-randomized patient population. Thus, for example, surveillance data cannot measure the impact of an intervention to increase HIV testing at a site, because the number of HIV tests at a site and the HIV-negative results are not captured in the HIV case surveillance system.
A significant advantage of using existing, ongoing data systems is standardized, reproducible baseline data that can be used to inform design parameters of a site-randomized trial. This can significantly increase the power of the study to detect an effect [
28]. Baseline levels of study outcomes can be used to conduct a randomization with a higher probability of balance, protecting both power and Type I error. Estimates of the intraclass correlation, a parameter rarely available at the design stage of a site-randomized trial, also help ensure accurate power assessment in the trial. In the HPTN 065 viral suppression component, for example, the relative consistency of baseline proportions of VL suppression suggest that the study is likely to have adequate power to detect an 6% increase in VL suppression. In comparison, the variability among test sites in both numbers of new cases and linkage-to-care proportions presents a scenario where the study could only reliably detect an increase greater than 13%. However, it must be cautioned that the 1-2 year lag between the baseline surveillance data and trial implementation make the randomization vulnerable to unmeasured and/or as-yet-unknown administrative and mission changes in HIV test and care sites.
The standardization of surveillance systems across the US permits “ecologic” comparisons of trends in non-intervention cities compared to observed trends in the intervention communities in HPTN 065. With a national surveillance system, the expense of implementing study-specific data collection in other cities is avoided, although resources are needed for additional programming to implement the study-specific definitions of cohorts, outcomes, and evaluation periods at the non-intervention DOH.
While using surveillance systems has many advantages, to achieve high completeness in study outcome measures may require a significant reporting delay. All data systems experience reporting lag, but the passive nature of surveillance leads to a longer lag than in dedicated trial data collection systems, potentially resulting in a significant delay before the reporting of interim and final trial results. It is typical for annual surveillance data reporting new diagnoses of HIV to be finalized after one year, as reflected in the use of 2008 and 2009 data for study implementation in 2010 and 2011. The most significant lag currently occurs with verification of new diagnoses: the laboratory report that identifies an individual as a possible new diagnosis of HIV must be de-duplicated and matched to the national surveillance registry and, if not matched, a field investigation completed to confirm the date and disposition of diagnosis and collect all other data required for surveillance. Both CDC and the state and local surveillance authorities have determined that a nine-month period between the date of initial HIV diagnosis (the draw date of the blood for the test that confirmed the HIV diagnosis) to confirmation of diagnosis and entry into the registry is required before reporting is >90% complete. However, reporting of laboratory data to update an existing registry record of patient, used for the viral suppression component of HPTN 065, is considerably more rapid. For example, reporting is complete within three months of draw date for laboratory results (CD4 and/or VL) from known HIV patients in jurisdictions with electronic laboratory reporting. Case migration is also a challenge: there is inevitably a lag in resolving data, particularly date of initial diagnosis, from cases relocated from elsewhere, or cases accessing care in the study jurisdiction but residing in a different jurisdiction. De-duplication across jurisdictions of data captured in the state and local surveillance systems occurs through the national CDC database.
A known challenge in assessing aggregate site outcomes is the requirement for specificity and completeness of site location fields on the laboratory requisition that allows surveillance to track laboratory results back to the ordering site. This relies on the procedures of participating sites, and coordination between providers and the DOH. While our success in obtaining and utilizing site-specific data for baseline measures establishes proof of concept for measuring site-level outcomes through the HIV surveillance systems, the data were known to imperfectly identify site for a substantial fraction of laboratory reports; for example, self-reported baseline data were used in the design stage for two care sites that are part of a large multi-site care system because of known problems with identifying the specific site from which the laboratory specimen emanated in the surveillance data. Improvements in the consistency of site identifying fields will be required to ensure accurate attribution of data to specific site locations for HPTN 065 study outcomes, as lack of precision will threaten the ability to detect change resulting from the interventions. Additional staffing resources for acquiring site attribution for study outcomes are being provided to the Departments of Health of participating jurisdictions.
Use of the surveillance data to assess aggregate site outcomes would not be possible in jurisdictions without reporting of all laboratory values, with poor quality monitoring of laboratory data, or with large data entry backlogs. All states have mandatory name-based laboratory reporting of positive HIV test results, but only 28 states, Washington, D.C. and Puerto Rico, currently have mandatory reporting of all CD4 and VL values; some states only require CD4<200 or detectable VL to be reported. Some jurisdictions, including New York City, also have mandated electronic reporting to the DOH, which is clearly an advantage for achieving complete and timely reporting. Many jurisdictions are improving their systems to utilize electronic reporting from laboratories to ease the upload of data into the surveillance system. HPTN 065 is evaluating the use of surveillance data in an additional four communities that serve as non-intervention communities in this study (Houston, TX Miami, FL; Chicago, IL; Philadelphia, PA). By evaluating the issues encountered across a broad range of cities affected by the HIV epidemic, we plan to assess the human and technical resources required to use surveillance data for evaluating public health strategies in site-randomized studies.
HPTN 065 benefits from extensive CDC investment in the DOHs HIV surveillance systems of each of these six communities: Each jurisdiction has created and executed HIV testing campaigns in the community, taking advantage of additional funding opportunities available through the CDC following the publication of the revised CDC testing recommendations in 2006 [
29]. A collateral benefit of these CDC- funded activities is good working relationships between DOH and HIV care and testing providers, and detailed prior knowledge of HIV test and care facilities throughout the community. This is reflected in the high participation rates amongst selected sites in both intervention cities in our study.