|Home | About | Journals | Submit | Contact Us | Français|
Several randomized, controlled trials (RCTs) have tested strategies to prevent sexual acquisition of HIV infection, but their quality has been variable. We aimed to identify, describe, and evaluate the quality of RCTs studying biomedical interventions to prevent HIV acquisition by sexual transmission.
We conducted a systematic review to identify all RCTs evaluating the efficacy of biomedical HIV prevention interventions. We assessed seven generic and content-specific quality components important in HIV prevention trials, factors influencing study power, co-interventions provided, and trial ethics.
We identified 26 eligible RCTs. The median number of quality components judged to be inadequate or unclear was 3 (range, 1-4) in 1992-1998, 3 (range, 1-4) in 1999-2003, and 0 (range 0-2) in 2004-2008 (p < 0.001). Common problems that may have biased results included low retention (median 84%), poor adherence to interventions requiring ongoing use (median ≤78%), and lower HIV incidence than expected a priori (in 8 of 11 trials where evaluable).
Reporting of trials of biomedical HIV prevention interventions has improved over time. However, quality improvement is needed in several key areas that influence study power, including participant retention, adherence to interventions, and estimation of expected HIV incidence.
In 2007, 2.7 million adults became infected with the human immunodeficiency virus (HIV) worldwide, predominantly through sexual transmission . Because behavioral risk reduction programs have had limited effectiveness at reducing sexual transmission [2-5], more effective methods to decrease this risk are urgently needed. One biomedical intervention, male circumcision [6-8], has demonstrated efficacy in randomized controlled trials (RCTs). Other interventions, such as sexually transmitted disease (STD) control, diaphragm use, microbicides, pre-exposure prophylaxis with antiretroviral agents, and candidate vaccines, have yielded inconclusive, equivocal, or negative results to date [9, 10]. These RCTs have yielded important lessons about the design of effective interventions, and have also taught us the importance of HIV prevention trial quality .
Opportunities for rigorous evaluation of HIV prevention interventions are limited, since such studies tend to be both logistically challenging and very costly. Accordingly, trials must be of high quality to ensure efficient use of research funds. Systematic reviews of specific preventive interventions for HIV have been published or are planned [9, 11-13]; however, a systematic review of the quality of biomedical intervention trials has not been conducted. Our objective was to identify, describe, and evaluate the quality of all RCTs that have investigated the efficacy of biomedical interventions to prevent sexual acquisition of HIV. We aimed to determine whether trial quality had improved over time and to highlight common challenges to be addressed in future trials.
We included RCTs of preventive interventions aimed at reducing the incidence of sexually transmitted HIV infections in adolescents and adults. The unit of randomization could be either individuals or clusters of individuals (e.g., communities). We excluded studies that used historical controls.
We included studies of adolescent or adult populations. Studies of pregnant women were only included when a primary objective of the study was the prevention of HIV acquisition in the mother (e.g., from a sexual partner) and not to prevent mother-to-child transmission. Studies that enrolled injection drug users were included if the intervention tested was also hypothesized to prevent sexual transmission.
We included biomedical interventions for prevention of HIV transmission. We excluded interventions aimed at behavior change, which included interventions to promote Abstinence, Being faithful to a single partner, or using Condoms (the so-called “ABC’s” of HIV prevention) or targeting other risk behaviors such as decreasing needle sharing among injection drug users. Interventions aimed at secondary prevention of HIV transmission (e.g., risk reduction or biomedical interventions targeting HIV-seropositive persons) were excluded. We included studies where the comparators were placebo, routine HIV prevention services, or deferral of the intervention.
We included only trials where HIV incidence was a prespecified primary or secondary outcome measure.
With the help of an expert librarian, electronic databases of Medline (1950 – March 2009), Embase (1980 – March 2009), and the Cochrane Central Register of Controlled Trials (through March 2009) were searched for eligible articles. We also reviewed abstracts presented from March 2007 through March 2009 at four major HIV conferences: the International Acquired Immune Deficiency Syndrome (AIDS) conference, International AIDS Society conference, the AIDS Vaccine conference, and the Conference on Retroviruses and Opportunistic Infections. We reviewed the reference lists of identified trials and recent review articles, as well as preliminary data from registered trials. An example of the search terms used (for Medline) is presented in Supplemental Information 1.
One author (SMG) scanned the titles of all identified studies for obvious exclusions. Potential abstracts were reviewed for eligibility and selection of articles for further review, using strict inclusion and exclusion criteria as described above. We reviewed the full texts of all remaining published articles and extracted data from eligible studies using a standardized form. When trials referenced previous publications containing details of trial design or baseline characteristics, we reviewed the full text of the earlier article. Data were independently extracted by two reviewers (SMG, ZCVA). Discrepancies regarding non-quality items were resolved through consensus while disagreements regarding quality criteria were resolved by an independent third review (PS) and subsequent consensus.
To evaluate trial quality, we focused on important quality components as recommended by the CONSORT guidelines [14, 15]. We classified each component of methodologic quality as adequate, inadequate, or unclear, using the criteria of the Cochrane Collaboration’s “risk of bias” tool , tailored specifically for HIV prevention trials. We rated seven quality components for each trial (Table 1): allocation sequence generation, allocation concealment, blinding of assessors and subjects, handling of attrition, non-selective reporting, sample size estimation, and inclusion of subjects in analysis . We noted when blinding was not possible due to the nature of the intervention or to the lack of an adequate placebo at the time, and did not classify this as inadequate.
We also summarized data on additional factors that may have introduced bias by reducing trial power. Data related to study power included number of participants enrolled, target enrolment, percent of target met, number per group as randomized, actual versus planned follow-up duration, loss to follow-up per group, and percent retention. Where actual follow-up was not specifically reported, mean follow-up was calculated as the total number of person-years divided by the total number of participants in the primary analysis. Reported median adherence in the intervention group and the method of adherence assessment (e.g., self report, biologic testing) were noted. Reported condom use in each group after follow-up was abstracted, selecting the latest reporting period available. We classified contamination as definite when it was reported in trial results and as possible if the intervention could have been obtained outside the trial or from another participant. Incidence and expected incidence in the control group were abstracted as presented in the trial publication. If one was expressed as a proportion (e.g., 4% at 2 years) and the other as a rate (e.g., 2/100 person-years of observation), these data were presented but no direct comparison made.
Prevention interventions offered to all study participants were viewed as co-interventions, since they also may have the effect of reducing study outcomes. These co-interventions included HIV counseling and testing, condom provision, screening and treatment for STDs, and needle exchange for intravenous drug users. In addition, information was noted regarding care provided to both ineligible candidates and participants, informed consent, and ethical approvals.
We summarized trial attributes using medians and ranges for continuous data, and frequencies and percentages for categorical data. Because preliminary graphs showed a gradual improvement in quality with no specific change point, we divided the trials into 3 periods of 5-7 years each: 1992-1998 (n = 3), 1999-2003 (n = 6), 2004-2008 (n = 17). Trends over time in reporting of quality components were analyzed with Cuzick’s nonparametric test for trend across ordered groups for binary variables  and Kruskal–Wallis tests for continuous variables. Comparison of actual versus expected incidence in control groups was made using the Wilcoxon signed ranks test. Heterogeneity of interventions precluded a summary statistic of effectiveness. All analyses were performed using Stata (version 9.0, Stata Corporation, College Station, Texas, USA).
We identified 26 full-text articles and 2 abstracts reporting RCTs that met our search criteria (Figure 1). Over 109,000 participants were enrolled in these trials. Microbicides (n = 11) and STD control interventions (n = 8) were the most frequently tested strategies (Table 2). No quasi-randomized studies (randomized based on days of week, odd/even date of birth, hospital number, etc.) were identified. Twenty-five studies compared a single intervention versus control group; two studies had three arms: one community-randomized trial compared both a behavioral intervention alone and the same intervention with improved STD control to routine services , and one microbicide trial compared two different gel products to placebo or no gel .
The median sample size in the included studies was 2,350 (range, 138 – 20,516), and the median duration of follow-up was 18 months (range, 6 – 36 months). Overall, 13 of the 28 trials were terminated early (Table 2). All three trials of male circumcision were stopped early due to strongly positive findings of a preventive effect [6-8]. At least two microbicide interventions may have increased HIV risk: a trial testing a cellulose sulfate gel was stopped early for harm , while an earlier trial of a nonoxynol-9-containing gel was completed but also reported a higher risk of HIV-1 acquisition in more frequent users . These findings led to the early stopping of an additional RCT , and confirmed the lack of efficacy in previous studies [23-25], two of which had been stopped for futility [23, 24]. Several additional RCTs were stopped for futility: two RCTs and one arm of an ongoing trial that tested newer microbicide candidates [26, 27, 28], one STD control trial , and two vaccine trials [30, 31]. The only published study of pre-exposure prophylaxis was stopped after two trial sites were closed, one due to concerns from the host country about the standard of post-trial care for seroconverters, and the other due to repeated noncompliance with the protocol by research staff at the site .
We excluded two abstracts from detailed quality review because we were unable to obtain information regarding trial details [19, 31]. We found that most of the twenty-six included studies adhered to CONSORT guidelines for the reporting of results (Table 3) [14, 15]. In all studies, follow-up and outcome ascertainment procedures were the same in each randomized arm. In addition, all trials achieved baseline balance in important participant characteristics in the study arms. A modified intention-to-treat analysis, including only participants with at least one follow-up evaluation and excluding participants found to be ineligible after randomization, was presented for every RCT assessed; therefore, inclusion in analysis was adequate in all RCTs and is not presented in the table. We did not judge blinding to be inadequate or unclear when it was not possible. Of seven quality components, the median number judged inadequate or unclear across trials was 1 (range, 0-4). The median number of components judged inadequate or unclear was 3 (range, 1-4) in 1992-1998, 3 (range, 1-4) in 1999-2003, and 0 (range 0-2) in 2004-2008 (p < 0.001). This improvement in trial quality is primarily driven by the difference between the two earlier periods and the most recent period (p=0.0002). Each quality component is discussed briefly below.
Allocation sequence generation and allocation concealment were adequate when reported. Blinding of participants was often not possible due to the nature of the intervention (male circumcision [6-8], diaphragm , and community STD interventions [18, 34, 35]). Blinding was judged inadequate in one microbicide trial (comparing a sponge to a placebo gel that was switched to a cream) . One study reported problems with false positive ELISA results at its first interim analysis, but there is no indication that this was related to problems with blinding . The proportion of trials reporting adequate allocation sequence generation increased over time (2 of 3, 3 of 6, and 16 of 17 in each time period, p=0.058), as did reporting of allocation concealment (1 of 3, 2 of 6, and 14 of 17; p=0.025) but not adequacy of blinding when this was possible (1 of 2, 2 of 2, 12 of 12; p=0.831).
In all included RCTs, participants were deemed lost to follow-up when they could not be reached despite tracing efforts after missed visits. Other participants withdrew, discontinued participation due to various reasons, or were known to have died. Exclusions were made if no HIV test was available during follow-up (i.e., outcome ascertainment not possible) or if the participant had a positive PCR test for HIV at enrolment (i.e., should have been excluded but test not available in real time). Balance across arms in attrition rates (including deaths) and exclusions was adequate in nineteen studies, but unclear in seven studies. In the four community-randomized studies, participants included both HIV-infected and HIV-uninfected persons, although HIV incidence during follow-up was analyzed in the initially uninfected subgroup. In each of these trials, no data is presented on trial balance specifically in the HIV-uninfected subgroup, making it impossible to say whether differential loss-to-follow-up occurred in the intention-to-treat cohort [18, 29, 34, 35].
Two circumcision trials and one microbicide trial reported differential loss to follow-up; in all three cases, it was unclear if this difference was related to the study outcome [6, 7, 23]. Since blinding was not possible in the circumcision trials and was inadequate in the microbicide trial, small differences in loss to follow-up could indicate attrition bias. For example, in the microbicide trial, women assigned to a nonoxynol-9 sponge had lower follow-up rates . It is conceivable that genital irritation caused by the sponge was more severe in women with more frequent use and that more frequent use was related to both HIV risk and drop-out, thereby biasing results. Adequacy of reported handling of attrition did not improve significantly over time (1 of 3, 4 of 6, and 14 of 17 in each time period, p = 0.083).
Fifteen of the 26 studies were judged free of selective reporting based on trial registration with posted information on objectives and planned analyses; six of these studies made the trial protocol available as supplemental information with the trial publication. Five additional trials had published details of the study design prior to trial completion.[18, 29, 34, 36, 37] Trials that had no publication on study design and were not registered could not be adequately reviewed and were judged unclear. One vaccine trial reported a subgroup analysis demonstrating efficacy among ethnic minority participants despite a negative overall result . These analyses have been criticized for limited subgroup sample size, but were in fact planned a priori . The number of RCTs with protocol information available prior to publication increased significantly over time (1 of 3, 2 of 6, and 17 of 17 in each time period, p = 0.001).
Two studies were determined inadequate with respect to sample size reporting because they did not include a rationale for the numbers enrolled [24, 39]. There was no significant difference in reporting of sample size over time (3 of 3, 4 of 6, and 17 of 17 in each time period, p = 0.262).
We collected information on other factors that may have biased results by decreasing power to detect an important effect, as summarized in Table 4. Common problems that may have biased results included low retention, poor adherence to interventions, and lower HIV incidence than expected. These and other factors are discussed below.
All but three studies reported data on individuals who were screened but did not enroll [23, 24, 25]. Six of twenty-four trials for which target sample size information was available did not enroll the intended number of participants. Reasons were early stopping [20, 22, 32] and slow enrolment, necessitating an extension of recruitment or the follow-up duration [35, 40, 41]. Two studies with slow enrolment had more endpoints than expected and so were judged to have sufficient power [40, 41]. One study was stopped due to slow enrolment and low retention of participants . Median retention in these RCTs was 84% (IQR, 72%-89%). Retention of at least 80% of participants has been a customary goal for clinical trials ; 11 of the 26 RCTs included had a retention rate <80%.
The definition of adherence and method of assessment varied by intervention (e.g., direct observation of circumcision status, self-reported microbicide use). Median adherence in the intervention group was 94% for circumcision, 90% for STD control (when this could be individually assessed), 84% for vaccination, 78% for microbicides, 74% for pre-exposure prophylaxis, and 73% for the diaphragm (Table 4). In trials in which a more stringent biologic method was used to supplement self-report or pill count for interventions requiring daily administration, adherence was lower by the more stringent method (i.e., 96% by self-report vs. 41% by applicator testing; 90% by pill count vs. 33%-67% by urine testing) [41, 43].
The median target reduction in HIV incidence was 50% (IQR, 50% – 50%; range, 33% – 75%). Actual incidence was lower than expected in 8/11 trials (73%) for which this was evaluable, although this difference did not reach significance (p = 0.083). Sample size was increased due to low HIV incidence in three studies [33, 36, 43], and follow-up was prolonged in two of these [36, 43]. Two microbicide studies were halted because they would have needed to enroll too many additional women, given incidence in the ongoing trial [26, 27].
All but one trial reported provision of HIV counseling and testing to trial participants . Condom provision was reported in all but three trials [30, 34, 37] and STD care was provided in all but four trials [30, 32, 37, 44]. Vaccine trials were less likely to report routine provision of condoms and STD treatment to participants [30, 37, 44]. Of note, the only trial that recruited injection drugs users could not provide clean needle exchange at study sites, as this was illegal in the host country . HIV care referral was not specified by any trial published prior to 2005; after this year, all but two trials specified that referral of screenees and participants testing HIV-seropositive for care was available [35, 40]. Two trials included partner STD treatment as a benefit to study participants [7, 8]. All studies specified that informed consent was sought from participants, and all but one RCT  provided information on relevant ethical approvals.
No comprehensive review of the quality of HIV-1 prevention trials has been conducted previously. Where meta-analyses of specific interventions have been conducted, detailed quality assessment was not a key feature [9, 11, 13]. For example, a meta-analysis of nonoxynol-9-containing microbicides for HIV prevention considered the quality of the 5 included trials to be fair to high, but only ratings on allocation concealment were presented . Narrative reviews representing expert opinion on HIV prevention trials have recently been published [10, 45], but have included little data abstracted from trials. This is the first study to evaluate the quality of these trials as a group, despite their similar objective (HIV prevention), target population (adults at high risk for sexual acquisition), and design (long-term follow-up with periodic HIV testing after randomization).
Our systematic review of trials that evaluated biomedical interventions to reduce HIV acquisition risk found evidence that the overall quality of HIV prevention trial reporting has improved over time, with significant improvement in several key areas, including allocation concealment and availability of protocol information by publication or registration before study completion. Although the CONSORT statement has been available since 1996, its adoption has increased more recently [14, 15, 46]. Improvement in trial reports is likely attributable at least in part to clear guidance on the requirements for quality trial reporting. The requirement that RCTs be registered before enrolling subjects has encouraged clear documentation of study objectives, endpoints, and an overview of planned analyses. Registration and on-line posting of protocols should further reduce the likelihood of selective reporting in this as in other fields [47-49]. Because full trial protocols were available for review for only a minority of RCTs, we are unable to say whether improved quality ratings are also due to better trial design.
One potential criticism of these prevention studies is the lack of blinding. Blinding was impossible in eight studies and inadequate in one study. We have not judged as inadequate studies where blinding was not possible. In some situations, blinding may not be considered desirable (for instance, when changes in behavior may be induced in both groups under study); one recent microbicide study included a blinded placebo gel arm and an open-label “no gel” arm to address this concern . However, unblinded studies may be biased if the knowledge of treatment allocation affects participant outcomes through differential co-interventions, cross-over, drop-outs, or outcome ascertainment . Outcome assessment and data analysis were rigorously blinded in these RCTs, but participant and research staff unblinding may have led to differences in risk reduction counseling or individual behavior; this possibility was discussed in several RCT reports.[6, 7, 33, 50] In such cases, it is critical to evaluate imbalances in attrition, adherence, or reported risk behaviors and conduct a sensitivity analysis to determine the degree to which bias may have influenced results.
Our review indicated that a significant issue for HIV prevention studies was adequate sample size. The anticipated reduction in HIV incidence was 50% in the vast majority of trials, with only 3 RCTs designed to detect a smaller difference [29, 33, 35] and one small RCT, perhaps unrealistically, expecting a 75% reduction . Therefore, most trials were not powered to detect a moderate effect (e.g., 30%-40%) that could be clinically important and may have been more realistic for the intervention tested. Factors exacerbating this problem include under-enrollment, high attrition, poor adherence to interventions, and contamination or cross-over between groups. Strong adherence promotion and monitoring are crucial to future trials of microbicides or oral prophylaxis regimens [32, 40, 43, 51], although such measures may limit the applicability of trial results. High retention is a key goal regardless of intervention. To compound these challenges, many trials had a lower HIV incidence than anticipated, a decline which may have a temporal component in several trials [21, 23, 33, 37, 41]. Observed low HIV incidence may be due to effective co-interventions, the Hawthorne effect (in which behavior improves under observation), attrition bias (in which high-risk subjects are lost to follow-up), or a combination of factors [33, 40, 52]. These effects must be taken into account in planning trial size, and should be based on feasibility studies of the target population where possible .
Our assessment of study ethics focused on a description of services available to study participants alongside the interventions being studied. While there is no universally accepted “standard of care” for such services, most studies have included, at a minimum, individualized counseling about risk behaviors and the free distribution of condoms. Many also provided STD treatment, which is a valuable benefit to participants. Studies may also offer circumcision referrals to eligible male volunteers. One recent vaccine trial reported that uptake of circumcision during follow-up was associated with higher baseline behavioral risk . Co-interventions such as circumcision, condom use, and behavior change have the potential to significantly reduce HIV incidence in the study population. Finally, it is also noteworthy that recent trials all reported referral for HIV care. While such referral is not strictly a co-intervention, it indicates the increasing availability of antiretroviral therapy in the trial communities; this greater availability may also reduce population HIV incidence [56, 57].
One limitation of our review is that we assessed quality based on available information from published reports and did not contact primary authors for clarification. Some studies may have failed to report on certain criteria that were actually adequately addressed in study planning and execution . However, the methodological quality and reporting quality of studies are thought to be highly related . Another limitation is that we did not focus on issues specific to particular interventions. For example, it has been pointed out that in studies of vaginal microbicides, HIV acquisition via unprotected anal sex would render the intervention ineffective and bias results towards no apparent effect on vaginal transmission . For studies of interventions contraindicated in pregnancy, family planning is an important means of retaining participants [44, 60]. Testing of microbicides for safety and acceptability among pregnant women would help address this issue.
In conclusion, most trials of biomedical HIV prevention interventions were rated as acceptable on the key quality components we reviewed, and reporting has improved significantly over time. However, several common challenges in HIV prevention trial design and execution tend to bias results towards the null. Our review of the quality of HIV prevention studies compiles useful information on the experiences of previously reported RCTs, and highlights a need for quality improvement in several key areas that influence study power, including participant retention, adherence to interventions, and estimation of expected HIV incidence. Investigators should assume that incidence may be lower than predicted or decrease during follow-up, and account for this in the planning of trials. The HIV epidemic continues to claim lives, and effective prevention strategies are urgently needed. Attention to lessons learned from previous studies can optimize trial design for promising new interventions.
We would like to thank Elizabeth Uleryk, Director, Library & Archives at the Hospital for Sick Children in Toronto, for providing expert assistance on search methodology. The authors gratefully acknowledge the support of the Ontario Ministry of Health and Long-Term Care.
SMG is supported by a Clinician Scientist Award from the University of Toronto and by a Mentored Patient-Oriented Career Development Award (K23 AI069990) from the National Institutes of Health. Dr. Bayoumi is supported by a Canadian Institutes for Health Research / Ontario Ministry of Health and Long-Term Care Applied Chair in Health Services and Policy Research. The Centre for Research on Inner City Health is supported in part by a grant from the Ontario Ministry of Health and Long-Term Care. The views expressed in this article are those of the authors, and no official endorsement by supporting agencies is intended or should be inferred.
Susan M. Graham, Department of Medicine, University of Washington, Seattle, Washington, USA; Department of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada.
Prakesh S. Shah, Department of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada; Department of Pediatrics, Mount Sinai Hospital, Toronto, Ontario, Canada; Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada.
Zoë Costa-von Aesch, Faculty of Medicine, McGill University, Montreal, Quebec, Canada.
Joseph Beyene, Department of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada; Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
Ahmed M. Bayoumi, Departments of Medicine and Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada; Centre for Research on Inner City Health, The Keenan Research Centre in the Li Ka Shing Knowledge Institute of St. Michael’s Hospital, Toronto, Ontario, Canada; Division of General Internal Medicine, St. Michael’s Hospital, Toronto, Ontario, Canada.