Let

*n* denote the total number of subjects in the vaccine trial. For subject

*i* (

*i* = 1, …,

*n*), let

*V*_{i} denote the observed treatment indicator,

*W*_{i} denote a collection of first-phase baseline covariates in the case-cohort sampling (measured on everyone), and

*S*_{i}(

*V*) denote the potential immune response of the subject if he/she is assigned vaccine (

*V* = 1) or placebo (

*V* = 0). Similarly, for

*V* = 1, 0, let

*T*_{i}(

*V*) and

*C*_{i}(

*V*) be the potential failure time and censoring time, and

*X*_{i}(

*V*) = min{

*T*_{i}(V), C_{i}(V)} and

*δ*_{i}(V) =

*I (T*_{i}(V) ≤

*C*_{i}(V)). Let

*t*_{1}*,* …

*, t*_{K} indicate the fixed visit times, with

*t*_{2}, …,

*t*_{K} the possible discrete failure times for

*X*_{i}(

*V*_{i}). Let

denote censored at the final visit and

*M*_{i} denote the last visit number of subject

*i* during the trial period, thus,

*M*_{i} {1, …,

*K*}. For vaccine recipients at-risk at

*t*_{1} and in the

*IC*, the immune response

*S*_{i}(

*V*) is measured at time

*t*_{1}. Letting

*R*_{i}(

*V*) denote the potential at-risk indicator at

*t*_{1},

*S*_{i}(

*V*) is only defined if

*R*_{i}(

*V*) = 1; otherwise, we put

*S*_{i}(

*V*) = *. We assume that the censoring process

*C*_{i}(

*V*) and failure time distribution

*T*_{i}(

*V*) are independent given {

*W*_{i},

*R*_{i}(

*V*),

*S*_{i}(

*V*)}.

Suppose that {*V*_{i}, *W*_{i}, *R*_{i}(0), *R*_{i}(1), *S*_{i}(0), *S*_{i}(1), *X*_{i}(0), *X*_{i}(1), *δ*_{i}(0), *δ*_{i}(1), *i* = 1, …, *n*} are i.i.d. We make the following assumptions to identify the estimands:

A1. Stable unit treatment value assumption (SUTVA).

A2. Ignorable treatment assignments. Conditional on *W*_{i}, *V*_{i} is independent of {*R*_{i}(0), *R*_{i}(1), *S*_{i}(0), *S*_{i}(1), *X*_{i}(0), *X*_{i}(1), *δ*_{i}(0), *δ*_{i}(1)}.

Assumption A1 guarantees the “consistency” property (i.e., the observed outcomes for a subject assigned *V* equals his potential outcomes if assigned *V*) and that the potential outcomes of one subject are not impacted by the treatment assignments of other subjects. A2 holds for randomized, blinded trials.

Under the above assumptions, we define two vaccine efficacy estimands: 1. Conditional on joint potential outcomes (joint VE)

2. Conditional on marginal potential outcome (marginal VE)

The estimand

*VE*(

*s*_{1},

*s*_{0}) conditions on membership in the basic principal stratum {

*S*(1) =

*s*_{1},

*S*(0) =

*s*_{0},

*R*(1) =

*R*(0) = 1}, and

*VE*(

*s*_{1}) conditions on membership in a union of basic principal strata [

Frangakis and Rubin (2002)]. The estimands condition on

*R*_{i}(1) =

*R*_{i}(0) = 1 or on

*R*_{i}(1) = 1 because

*S*_{i}(

*V*) is only defined if

*R*_{i}(

*V*) = 1,

*V* = 0, 1. The estimands are principal stratification estimands in that the pair (

*S*(1),

*S*(0)) or

*S*(1) can be treated as a baseline covariate. However, they are not causal estimands, because the numerators and denominators condition on different events

*T* (1) ≥

*t*_{k}_{−1} and

*T* (0) ≥

*t*_{k}_{−1}. Nevertheless they are scientifically interesting, in the same way that a hazard ratio conditional on baseline covariates is interesting.

To help identify the estimands, only subjects with *R*_{i}(*V*_{i}) = 1 are included in the analysis, and we assume the following:

A3. Equal drop-out and risk up to time

*t*_{1}:

*R*_{i}(1) = 1

*R*_{i}(0) = 1.

A3 implies that subjects observed to be at risk at *t*_{1} will have *R*_{i}(1) = *R*_{i}(0) = 1, so that *S*_{i}(1) and *S*_{i}(0) are both defined.

In addition to A1–A3, identifiability of *VE*(*s*_{1}, *s*_{0}) requires a way to predict *S*_{i}(1) for subjects with *V*_{i} = 0 and a way to predict *S*_{i}(0) for subjects with *V*_{i} = 1. Identifiability of *VE*(*s*_{1}) is easier because only the *S*_{i}(1) for subjects in arm *V*_{i} = 0 must be predicted. Furthermore, for our motivating application, typically the immune response *S*_{i}(0) is zero for all placebo recipients, because exposure to the vaccine is necessary to stimulate an immune response. For these reasons, henceforth, we focus on the marginal estimand *VE*(*s*_{1}). Note that, for applications with *S*_{i}(0) = 0 for all *i*, *VE*(*s*_{1}) = *VE*(*s*_{1}, 0).

We propose a Cox model for the discrete cumulative hazard function Λ(*t*),

with

*Z* = {

*V*,

*S*(1),

*V S*(1),

*W*′}′,

*β*= {

*β*_{1},

*β*_{2},

*β*_{3},

*β*′

_{4}}′, and Λ

_{0}(·) is the discrete baseline cumulative hazard function. The marginal

*VE*(

*s*_{1}) can be expressed as

The discrete hazards always condition on {*R*(1) = 1} and, henceforth, we assume this implicitly. For subjects with a particular baseline covariate *w*, a similar estimand *VE*(*s*_{1}|*w*) can be formed by conditioning on *W* = *w* in the hazards.

The population estimand *VE*(*s*_{1}) contrasts the rate of the clinical event for subjects with *S*(1) = *s*_{1} under assignment to vaccine versus under assignment to placebo. Supposing *S*(1) is bounded below at value zero which indicates a negative immune response, we define *S* to be a *predictive surrogate* if *VE*(0) = 0 and *VE*(*s*_{1}) > 0 for all *s*_{1} > *C* for some constant *C* ≥ 0. These conditions reflect population level necessity and sufficiency of the immune response to achieve positive vaccine efficacy.

Under A1–A3 and the Cox model (1), the estimand equals

In

equation (2) a negative value of

*β*_{3} indicates that a higher immune response to vaccine predicts greater vaccine efficacy. On the other hand,

*β*_{3} = 0 implies

*VE*(

*s*_{1}) is constant in

*s*_{1} so that the marker does not predict vaccine efficacy. Therefore, testing

*H*_{0} :

*β*_{3} = 0 versus

*H*_{1} :

*β*_{3} < 0 assesses sufficiency. A value

*β*_{1} = 0 indicates necessity, and both

*β*_{1} = 0 and

*β*_{3} < 0 indicate the marker is a predictive surrogate. The magnitude of

*β*_{3} indicates the quality of the predictive surrogate with

*β*_{3} = 0 suggesting no surrogate value [

*VE*(

*s*_{1}) is constant in

*s*_{1}] and larger |

*β*_{3}| suggesting greater surrogate value (greater predictiveness).