Suppose that *n* patients per group are randomized to placebo or vaccine. Prior to randomization, all patients receive a rabies vaccine and the immune response to rabies vaccine (*W*_{0}) is measured before randomization. Patients are then randomized to either a placebo or HIV vaccine injection and shortly thereafter, immune response to the HIV vaccine (*X*_{0}) is measured in the vaccine group. At the closeout or end of the trial, all uninfected placebo recipients receive the HIV vaccine and shortly thereafter, immune response to this vaccine is measured (*X*_{C}). Let *Y* be the infection indicator and *Z* be the vaccine indicator. A schematic representation of a vaccine trial augmented with BIV and CPV is given in .

Our approach to using these data is perhaps best described using counterfactual reasoning (

Rubin, 1974,

1977,

1978;

Halloran and Struchiner, 1995) and principal stratification (

Frangakis and Rubin, 2002). First, let

*W*_{0}_{i} be the baseline rabies-specific adaptive immune response for patient

*i*. This is seen in everyone. The response to HIV vaccination is different. One can write

*X*_{0}_{i}(

*z*) as the (post) baseline HIV-specific immune response to HIV vaccination. We call

*X*_{0}_{i}(0),

*X*_{0}_{i}(1) potential covariates;

*X*_{0}_{i}(1) is measured in vaccine recipients while

*X*_{0}_{i}(0) would be 0 in nearly everyone. We say that

*X*_{0}_{i}(1) is

*realized* in the vaccine group and

*unrealized* in the placebo group. Using the terminology of

Frangakis and Rubin (2002),

*X*_{i} (1) =

*x*,

*X*_{i} (0) = 0 defines a principal stratum indexed by

*x*. Principal strata are a classification of subjects defined by the potential values of a post-treatment variable under each of the treatments being considered. They also call

*X*_{0}(1) a principal surrogate and distinguish it from a “statistical” surrogate, which for our setup would be

*X*^{obs} =

*X*_{0}(1)

*Z* +

*X*_{0}(0) (1 −

*Z*). We next define

*Y*_{i}(

*z*) as the outcome for person

*i* following treatment

*z*. We call the pair

*Y*_{i}(0),

*Y*_{i}(1) potential outcomes. We also define

*X*_{Ci}(

*z*,

*y*) as the closeout HIV-specific adaptive immune response for person

*i* when given treatment

*z* and following outcome

*y*. Only

*X*_{Ci}(0, 0) is measured and meaningful:

We make the following simplifying assumptions:

- All patients receive the assigned injections so there is no noncompliance.
- There are no missing data;
*W*_{0}, *Y*_{0} are measured on everyone, *X*_{0} is measured on all vaccinees, and *X*_{C} is measured on all placebo uninfecteds. - No infections occur between the time of randomization and when
*X*_{0} is measured, say the interval [0, *m*].

The first two are for simplicity and can be relaxed. For example, if there is some noncompliance but it is governed by an independent random mechanism, our methods could be applied to just the compliers. With data missing completely at random the methods can be applied directly to the observed data. If the data are missing at random, methods that incorporate covariates associated with missingness can be used. The last assumption is more likely to be met if *m* is small. If a few infections occur in [0, *m*], an analysis that throws them out may be acceptable. We discuss how to modify our approach to incorporate infections during [0, *m*] in Section 6.

We next specify probit models for the effect of the “baseline covariate”

*X*_{0}(1) on the probability of infection in both groups:

where Φ( ) is the standard normal c.d.f. (cumulative distribution function). This equation specifies a model for a standard covariate by treatment interaction for a clinical trial. The probit is handy because it is easy to integrate over

*x*, which we will need to do later. Note that (1) assumes that

*W*_{0} has no effect on

*Y*(

*z*) once

*X*_{0}(1) and

*Z* are in the model. This can also be relaxed, as we discuss in Section 5.

Different causal estimands can be used to quantify the effect of the vaccine as a function of

*X*_{0}(1). For example, following

Hudgens and Halloran (2004) we define vaccine efficacy as

With our probit model, a natural estimand is

Note that when *β*_{3} = 0, Δ_{P}(*x*) is free of *x*, this is not true for *VE*(*x*).

If *X*_{0}_{i}(1) were observed in everyone, estimation would be straightforward. As *X*_{0}_{i}(1) is not observed in the placebo group, we require at least one of the following two assumptions to proceed:

*X*_{0}_{i}(1) can be viewed as a baseline covariate or

- For placebo uninfecteds,
*X*_{0}_{i}(1) = *x*_{i} + *U*_{1} and *X*_{Ci}(0, 0) = *x*_{i} + *U*_{2} where *U*_{1} and *U*_{2} are i.i.d. (independent and identically distributed) mean 0. We call this time constancy of immune response.

The first assumption is true by design in randomized trials and allows us to impute *X*_{0}_{i} (1) based on *W*_{0}_{i} in the placebo group. While technically measured post-randomization, this “post-baseline” covariate can be used as a baseline covariate. The second assumption allows us to replace *X*_{0}_{i}(1) with *X*_{Ci}(0, 0) as a covariate in the probit model for placebo uninfecteds. Under the model *X* = *x* + *U*, one can think of *x* as the true time constant immune response, which is observed subject to measurement error and our interest focuses on the regression of *Y* on *X*. This assumption cannot be accepted uncritically as immune response can diminish with age, such as for herpes zoster, if the trial is long enough. Additionally, volunteers might get subinfectious exposures to a virus that modifies immune response. This is thought possible for HIV where commercial sex workers showed immune responses to HIV but remained uninfected. However even here, the assumption might hold if the immune response is effectively primed by subinfectious exposure pre-baseline and this response is maintained during the course of the trial. Additionally, this assumption can be examined, as we will discuss in Section 5.

Our final assumption allows us to easily integrate over the distribution of *X*_{0}(1)|*W*_{0}:

- The distribution of
*X*_{0}(1), *W*_{0} is bivariate normal with moments *μ*_{x}, *μ*_{w},
,
, *ρ*.

This assumption can also be relaxed but the integration would be more complicated.

To estimate *β* = (*β*_{0}, *β*_{1}, *β*_{2}, *β*_{3}), we use maximum likelihood. We begin by constructing a likelihood incorporating both BIV and CPV. The likelihood contribution for vaccinees is simple,

where

*V* is the set of vaccinees. For uninfected placebo volunteers we use

*X*_{Ci} in lieu of

*X*_{0}_{i} and their contribution is

where

(

*U*) is the set of uninfected placebo recipients. In the placebo infecteds,

*X*_{0}(1) is missing and we need to integrate

*p*_{0}(

*X*_{0}(1)) over the distribution of

*X*_{0}(1)|

*W*_{0} to obtain their likelihood contribution. Under our last assumption, it follows that

*X*_{0}(1)|

*W*_{0} =

*w* is normal with mean

*μ**(

*w*_{0}) =

*μ*_{x} +

*ρσ*_{x}/

*σ*_{w}(

*w*_{0} −

*μ*_{w})and variance

. The (integrated) probability of infection for a person with

*W*_{0} =

*w*_{0} is thus

The right-hand side obtains the result that

for

*U* normal(

*μ*,

*σ*^{2}). The overall likelihood is thus

Note that

depends on the moments of

*X*_{0}(1),

*W*_{0}, which are unknown. We advocate estimating these moments using vaccine group data and regard them as fixed in

*L*_{BC}. Because of this, the standard error estimates obtained by the Fisher information matrix are incorrect and we suggest using the nonparametric bootstrap method to obtain standard errors.

We can also construct likelihoods based on augmenting the usual design with BIV alone or CPV alone. These are, respectively,

where

is the set of placebo recipients, and

where #

(

*I*) is the number of placebo infecteds. The last Φ( ) in

*L*_{C}(

*β*) is just the probability that a generic placebo patient is infected and equals

*E*{

*β*_{0} +

*β*_{1}*X*_{0}(1)}, where

*X*_{0}(1) is normal (

*μ*_{x},

). Based on the estimated

*β*’s it is a simple matter to plug them into a causal estimand. Standard errors and confidence intervals for causal estimands can be computed from the bootstrap.