Why Monthly HIV Diagnostic Tests?
The rationale for frequent HIV testing is to improve the assessment of immune correlates. The monthly schedule of HIV testing will allow catching 50-80% of the infected subjects in the acute-phase (antibody-negative phase) of infection, before HIV has undergone significant evolution, albeit some T cell escape may occur in the early weeks post-HIV acquisition (Goonetilleke et al., 2009
). This allows analysis of the originating HIV sequences in the majority of infected subjects, thereby allowing a ‘sieve analysis’ to be conducted, which is a method of identifying how the vaccine efficacy on HIV acquisition depends on the genetics of the transmitting/founder HIV sequences relative to the insert HIV sequences represented in the tested vaccine (Gilbert, McKeague, and Sun, 2008
); in particular to identify HIV amino acid sites and sets of sites in antibody epitopes or T cell epitopes that have an elevated rate of mismatch to the insert sequences in vaccine versus placebo recipients.
Sieve analysis is intrinsically tied with the evaluation of immune correlates of protection, as two sides of the same coin. Specifically, on the one hand, if VE > 0% and a sieve effect (i.e., elevated rate of amino acid mismatches to the insert sequence for vaccine versus placebo sequences) is detected, then the implication, given the fact the trial is randomized and double-blinded, is that vaccine-induced immune responses to certain HIV epitopes must have caused the protection. Therefore the detected sieve effect leads to follow-up explorations to identify measurable immune responses that capture (at least partially) these protective responses and thereby have some validity as surrogate endpoints for HIV infection. For example, identification of a sieve effect in 7 particular HIV antibody epitopes generates the hypothesis that the sum of neutralization levels to these 7 targets matched to the vaccine insert sequence would have high surrogate value.
On the other hand, sieve analysis is very useful for validating the degree to which an immunological measurement is a valid surrogate endpoint. To illustrate, suppose VE > 0% and the candidate surrogate, S, is a summary measure of the magnitude and breadth of neutralizing antibody titers to a panel of pseudo-viruses constructed from acute-phase HIV isolates from infected placebo recipients. If S has surrogate value to predict VE, it must be the case that protein differences to the vaccine-insert are larger in infected vaccine than placebo recipients; this logically follows because genetic mutations in antibody epitopes are known to effect neutralization levels. Therefore, sieve analysis is a tool for corroborating the surrogate value of S as a SoP. However, this sieve analysis would not be possible with infrequent HIV diagnostic testing such as the semi-annual schedule used by the previous efficacy trials, given that too-few infected subjects would be caught in the acute-phase to afford an assessment of the vaccine effect on transmitted sequences.
In addition, the sieve analysis may be directly incorporated into the surrogates assessment described above, by estimating the VE(s) curve with the endpoint definition restricted to HIV infection with a strain within a certain threshold of genetic distance to the vaccine-insert. This analysis would be repeated for a range of thresholds. Greater variation in the VE(s) curve for thresholds closer to the insert-sequence would support the value of the immune biomarker as a surrogate endpoint.
Intention-to-Treat and Per-Protocol Analysis of VE
Vaccine efficacy trials commonly assess VE in the intention-to-treat (ITT) cohort, which is all randomized subjects, as well as in the modified intention-to-treat (MITT) cohort, which is the subset of the ITT cohort that are later discovered to not have been HIV infected at baseline. Because blinded procedures are used for ascertaining baseline infection status, the MITT analysis has the same validity from randomization as the ITT analysis, such that the MITT analysis is generally preferred, given that it assesses vaccine efficacy in HIV uninfected persons. In addition, given the ubiquitous concern that a vaccine may not confer protection until all or at least some of the immunizations are received, most vaccine efficacy trials also assess vaccine efficacy in the sub-cohort that receives all of the immunizations and are disease-free after the immunization series; this sub-cohort may be referred to as the per-protocol (PP) cohort (Horne, Lachenbruch, and Goldenthal, 2001
). All of the past HIV vaccine efficacy trials assessed VE in both the MITT and PP cohorts, with the MITT assessment the primary analysis in each case (Gilbert et al., 2010
As stated above, the MITT analysis is primary because the comparator groups are guaranteed to have balanced prognostic factors on average due to randomization and double-blinding, such that the analysis assesses the causal effect of assignment to vaccine. In contrast, the standard analytic approach to assessing PP VE applies the same method as used for the MITT analysis, which compares HIV infection incidence between the subgroups of vaccine and placebo recipients that are observed to qualify for the PP sub-cohort. However, these comparator sub-cohorts are subsets of randomized subjects, such that the analysis is susceptible to possible post-randomization selection bias (Rosenbaum, 1984
; Robins and Greenland, 1992
; Frangakis and Rubin, 2002
), hence making the results difficult to meaningfully interpret. To improve upon this standard analysis of VE in the PP cohort, an analytic method that adjusts for measured factors that simultaneously predict HIV infection and PP sub-cohort membership (such factors cause the selection bias) should be applied (e.g., Lu and Tsiatis, 2008
; Tsiatis et al., 2008
; Zhang, Tsiatis, and Davidian, 2008
; Moore and van der Laan, 2009
; Zhang and Gilbert, 2010
), which in addition to correcting for bias can improve statistical power by leveraging prognostic factors. Moreover, because some simultaneously predictive factors may be unmeasured, the sensitivity of results to such factors should also be investigated, following the paradigm described in Scharfstein, Rotnitzky, and Robins (1999)
. Therefore, in our proposed design we assess VE in the MITT cohort for the primary analysis and conduct a causal sensitivity analysis of PP VE for the secondary analysis, wherein the answer is reported as a range of point estimates and a corresponding union of 95% confidence intervals (a so-called “sensitivity interval”), which account for a spectrum of potential levels of post-randomization selection bias (Shepherd, Gilbert, and Lumley, 2007
Timing of Reporting of Results and of Un-blinding
With respect to reporting the results, the proposed design has two stages: for stage 1, results are reported on VE(0-18); and for stage 2 [which occurs if and only if at least one vaccine regimen achieves positive efficacy for VE(0-18)], results are reported on the durability of VE between 18 and 36 months. For stage 2 the issues are simple: all vaccine arms advanced to stage 2 plus the placebo arm continue blinded follow-up until the last enrolled subject has 36 months of follow-up, at which time the final analysis is conducted and the results reported.
The issues are more complicated for stage 1, with the approach to un-blinding dependent upon which boundaries are reached. As soon as a vaccine arm reaches a conclusion [either by reaching the potential-harm boundary, the non-efficacy boundary, the high efficacy boundary, or completing the evaluation of VE(0-18) without reaching a boundary], the result is reported. This conveys the result to the field as expeditiously as possible. If a vaccine arm completed its evaluation by reaching the potential-harm boundary, then the arm would be immediately un-blinded, given the ethical warrant to inform participants of the potential harm caused by exposure to the vaccine. The other study arms would continue blinded. If a vaccine arm reaches the high efficacy boundary, then the placebo group is immediately un-blinded and offered this vaccine. If it is the single vaccine arm design, then the sole vaccine group is also un-blinded. However if it is the multiple vaccine arm design, and at least two vaccine arms are still being evaluated, then the blind is maintained for all of the vaccine arms, which allows continuing accrual of data for comparing vaccine efficacy head-to-head among the vaccine regimens. Furthermore, if a vaccine arm reaches the high efficacy boundary, it may be worth continuing the vaccine's evaluation out to 36 months. While a rigorous assessment of durability of VE will likely be impossible (given that the contemporaneous comparator placebo group is being offered the vaccine), the additional follow-up may nonetheless provide useful data about the vaccine, which would be difficult to collect in follow-on studies. Further thought is needed on this issue, and on whether it is also warranted to offer subjects assigned to the other vaccine arms the highly efficacious vaccine.
Next we consider the scenario wherein a vaccine arm completes its evaluation by reaching the non-efficacy boundary. In this case, blinded follow-up under the original HIV diagnostic testing schedule would continue either until all other vaccine arms are weeded out, or, in the case that at least one vaccine arm achieves positive efficacy, until all enrolled subjects have 18 months of follow-up. This continued blinded follow-up would contribute information to the analyses of safety, VE(0-18) (including comparisons with other vaccine regimens), and immune correlates of protection. If, alternatively, the arm were un-blinded then the post-un-blinding data would be excluded from the main analyses of vaccine efficacy and of immunological surrogate endpoints, given that the un-blinding may lead to imbalances in HIV prognostic factors between the vaccine and placebo groups (and between vaccine arms), which could not be confidently corrected for statistically due to the inability to accurately measure HIV risk behavior and exposure. Given the scientific benefit accrued from maintaining the blind and the absence of evidence of harm caused to participants, it seems ethical to maintain blinding for subjects assigned a vaccine regimen shown to have low efficacy at best.
For operational reasons, ideally all study arms would be un-blinded at the same time, as un-blinding one study arm could compromise follow-up for the participants assigned to the other arms. As discussed above, by dividing the trial into two stages the design does not achieve this, as vaccine arms reaching a stopping boundary will be un-blinded once the evaluation of VE(0-18) is completed, whereas vaccine arms not reaching a stopping boundary will be un-blinded once stage 2 is completed (expected at least 18 months later). While one approach would keep vaccine arms reaching the non-efficacy boundary blinded all the way through stage 2, this seems like a poor use of resources, given that non-efficacy over 18 months is expected to predict non-efficacy from 18-36 months, such that it is prudent to complete the evaluation of non-efficacious vaccines at 18 months. Thus, our approach makes the un-blinding as simultaneous as ethically warranted within each stage. As discussed above, for stage 2 a completely simultaneous un-blinding is achieved, whereas for stage 1, if no vaccine arms reach the potential-harm boundary then a completely simultaneous un-blinding is achieved. The informed consent process would describe the events that would trigger un-blinding, and the approach to un-blinding would be vetted with local Institutional Review Boards and the DSMB.
In summary, the whole study is un-blinded at the first event of: (1) the last of the vaccine regimens is weeded out, either by reaching the potential-harm boundary or the non-efficacy boundary; (2) the last of the vaccine regimens reaches the high efficacy boundary; (3) the last enrolled subject reaches 36 months of follow-up, for the case that neither event (1) nor (2) occurs. For event (1), the trial has maximum duration of 18 months beyond the last enrolled subject, and minimum duration the time at which either the last weeded-out vaccine regimen reaches the potential-harm boundary or accrues n1 infections diagnosed within 18 months.
What Does Completing a Vaccine Regimen for Non-Efficacy Entail?
As described above, upon reaching the non-efficacy boundary, the primary result on VE(0-18) would be reported, thus providing data as expeditiously as possible. shows that, by the time a vaccine regimen reaches the non-efficacy boundary, accrual is very likely to be complete, in which case weeding out a regimen would not spare enrollees, all of whom would have received at least one immunization. On the other hand a substantial fraction of enrollees will likely have not yet completed the immunization series, such that ceasing vaccinations upon reaching a non-efficacy boundary would spare immunizations. For example, at the median stopping time of a vaccine with VE(0-18) = 0%, approximately 3000 of the 4300 enrollees (pooled over a vaccine arm and placebo) would have completed the immunization series through Month 6 and approximately 1800 through Month 12. Moreover, regardless of the number of immunizations spared, it still may be warranted to cease immunizations at the time of reaching the non-efficacy boundary, as the primary question about VE(0-18) would have been answered. Furthermore, if accrual lags behind the planned accrual, then this approach may spare a great deal of immunizations and substantially decrease the total enrollment. Lastly, if a vaccine regimen reaches the potential-harm boundary then a large number of enrollments and immunizations would likely be spared. Therefore, the proposed design ceases immunizations and accrual to vaccine arms if and when they reach a non-efficacy boundary.
Equal Versus Unequal Allocation to the Vaccine and Placebo Groups
The design equally allocates subjects to each study group, which is inefficient for the two- and three-vaccine arm trials, for which the efficient design would randomize more subjects to the placebo arm. The rationale for equal allocation is to increase the information for the second and third secondary objectives to evaluate immunological correlates of infection rate in the vaccine groups and to compare vaccine efficacy among the vaccine regimens. Equal allocation results in efficiency loss for the primary objective in exchange for efficiency gain for key secondary objectives. This reflects the premise of the design that development of immune correlates of protection and head-to-head comparisons of vaccine efficacy are priorities for HIV vaccine research. More research is needed to thoroughly define the trade-offs of the equal-versus-unequal allocation approaches.
Accommodation of Pre-Exposure Prophylaxis (PrEP) and for Other HIV Prevention Interventions
Recently an efficacy trial in men who have sex with men in the Americas (mostly South America) demonstrated that daily oral PrEP use [fixed-dose combination tenofovir disoproxil fumarate (TDF) and emtricitabine (FTC)] provided an estimated 44% reduction in the incidence of HIV infection compared to placebo (Grant et al., 2010). Moreover, the incidence rate appeared especially low in men with detectable PrEP drug levels, suggesting that the PrEP efficacy is higher for adherent subjects. Because the PrEP drugs TDF and FTC are approved and some vaccine trial participants may take PrEP, it is relevant to consider how the design accommodates PrEP use. Moreover, several other efficacy trials of PrEP are ongoing, such that it is prudent to plan for how the trial design will respond to future results that will become available before or during the trial.
The baseline approach to accommodating PrEP does not alter the primary analysis, as it is intention-to-treat and compares HIV incidence among the vaccine and placebo groups while disregarding PrEP use. The event-driven design set-up is also unaltered, such that with or without PrEP the same numbers of HIV infections trigger the interim and final analyses. However, once the required numbers of events are fixed, PrEP use impacts the anticipated sample size needed to achieve the required number of infections in a timely manner via impact on the background HIV incidence. For example, if 10% PrEP use occurs, and we assume that PrEP users have a 50% reduction in incidence, then the sample size would need to be increased by approximately 5% (0.05 = 0.10×0.50) in order to deliver results within the same time-frame as the baseline scenario (no PrEP use). Alternatively, if all participants are offered PrEP and 80% accept it, then the sample size would need to be increased by approximately 40% (0.40 = 0.80×0.50).
Given the difficulty to predict the degree of PrEP use, the trial would monitor PrEP use through self-report questionnaires and PrEP drug level measurement. The enrollment target could be adjusted based on this monitoring; such an adaptation would pose minimal risk to study integrity because it is based on blinded data and a deterministic plan could be pre-specified for what data lead to what kinds of trial expansions. There is also uncertainty in the degree of PrEP efficacy, and this is addressed through the operational futility monitoring; the level of PrEP efficacy will affect the background HIV incidence, and the lower it is the more likely the operational futility guidelines will be met. The operational futility monitoring is primarily based on rates of accrual, HIV infection, and dropout during the study, regardless of the amount of PrEP use or PrEP efficacy.
It is relevant to evaluate whether PrEP is expected to enhance or diminish vaccine efficacy for trial design set-up, as this would impact the maximum plausible effect size VE, and hence could result in powering the trial for a different effect size. Currently the data on potential interaction of vaccines and PrEP are too scant to warrant altering effect size assumptions.
A second approach to accommodating PrEP use would offer a voluntary second randomization to PrEP or to PrEP placebo. This would form three analysis strata: subjects assigned PrEP, subjects assigned PrEP placebo, and subjects who declined the second randomization. The primary analyses would be intention-to-treat similar to the above, the difference being they would be stratified. For each regimen HIV incidence would be compared between vaccine and placebo within each of the three strata separately, and then aggregated into one overall estimate of VE; for example, assuming the same VE within each strata and using strata-specific baseline hazards in the Cox proportional hazards model. This analysis is valid because randomization and double-blinding guarantee balance in HIV prognostic factors within each stratum. While an interaction of PrEP and vaccine would complicate the interpretation, the assessment of the common VE still has useful interpretation as vaccine efficacy averaging over the three strata.
This primary analysis does not explicitly account for data on PrEP use or PrEP adherence, because of complications in achieving valid inferences adjusted for post-randomization intermediate variables that are subject to measurement error. However, secondary analyses using causal inference method would evaluate vaccine efficacy while subjects are actually using and not using PrEP. Additional secondary analyses would compare efficacy among each of the individual arms (Vaccine + PrEP, Placebo + PrEP Placebo, Vaccine + PrEP Placebo, Placebo + PrEP Placebo). A third approach would power the trial to compare efficacy among these individual arms, implicating a larger trial would be needed. These considerations for accommodating PrEP use are also relevant for use of other HIV prevention approaches. Accommodating microbicides may be particularly relevant given the recent report of a partially efficacious microbicide (point estimate of 39% reduction in HIV incidence compared to placebo) in the CAPRISA 004 Phase 2b efficacy trial of tenofovir gel conducted in South Africa (Karim et al., 2010