3.2. Specific surrogate estimand
By specific surrogate value, we mean the accuracy with which causal treatment effects on the biomarker
Z predict causal treatment effects on the clinical endpoint
Y (measured during a follow-up period after the biomarker is measured) for the same setting as the efficacy trial. This value may be measured with a PS estimand that we named the “causal effect predictiveness (
CEP) surface” (
Gilbert and Hudgens, 2008). This estimand conditions on not yet experiencing the clinical endpoint under either treatment assignment at the fixed time
τ (near baseline) that the biomarker is measured; however, to simplify the discussion we assume all subjects qualify for this group. In this case, for
Y a binary outcome, the
CEP estimand is defined as
(or some other contrast), where
Zx (
Yx) is the potential biomarker (infection status after
τ) if assigned treatment
x, for
x = 0 (placebo) and
x = 1 (vaccine) [
Gilbert and Hudgens (2008), building on
Frangakis and Rubin (2002) and
Follmann (2006)]. This estimand measures clinical efficacy in subgroups defined by certain causal vaccine effects on the biomarker. We refer to the
CEP estimand as the “specific surrogate estimand,” and in Section 3.5 discuss how it can be used in the assessment of transportability/bridging surrogacy that is ultimately of interest.
As for the post-infection PS estimand, our research for the
CEP estimand was initially motivated from dissatisfaction with the standard estimand traditionally used for evaluating an “immune correlate” of vaccine protection, which simply measures the association between
Z1 and
Y1 using data from the vaccine group, and, where a strong association is found, an inference is ventured that the correlate of infection may be used to reliably predict vaccine efficacy (
Qin et al., 2007). However, as extensively discussed in the surrogate endpoint evaluation literature, association between
Z1 and
Y1 does not imply association between
Z1 –
Z0 and
Y1 –
Y0, and yet, for our goal of predicting vaccine efficacy, it is the latter association that is truly of interest. To illustrate how the former association may not predict the latter, suppose
Z0 = 0,
Z1 =
U +
z1,
Y0 =
U +
y0, and
Y1 =
U +
y1, where
U,
z1,
y0,
y1 are all independent with
U ~
N(0,
σ = 10),
z1 ~
N(1,1),
y0 ~
N(0,1), and
y1 ~
N(1,1). Then
cor(
Z1,
Y1) = 100/101 and yet
cor(
Z1 –
Z0,
Y1 –
Y0) = 0, i.e.,
Z1 is an excellent predictor of
Y1 but
Z1 –
Z0 is a worthless predictor of the vaccine effect
Y1 –
Y0. The vaccine literature was void of estimands that quantify the correlation between treatment effects on biomarker and outcome, and we proposed the
CEP estimand for this purpose. Evaluating
CEP(
z1,
z0) at different fixed values (
z1,
z0) amounts to a series of subgroup analyses, equivalent to the common secondary objective in clinical trials to assess how the treatment effect (e.g., vaccine efficacy) varies with baseline covariates. Here again, principal stratification is useful in defining subgroups of particular interest, such as those who do and those who do not experience a causal vaccine effect on the biomarker in question.
Paraphrasing a question from Ross Prentice, “Why not instead focus research on discovering actual baseline covariates that predict vaccine efficacy?” The first part of the answer is that there often exist immune biomarkers measured after the immunizations that are much stronger efficacy predictors than any baseline covariates, simply because the post-immunization biomarkers are selected to putatively measure the functional immune response that kills the pathogen of interest before it can establish infection (see
Plotkin, 2010, for a review). But, the objection continues, the PS estimand is less useful than an actual baseline covariate because (
Z1,
Z0) can never be measured simultaneously on the same individual. While true for many vaccine efficacy trials, this is false for the large special class of trials that only enroll subjects without previous infection with the pathogen. For such trials,
Z0 is known to be zero/negative for all subjects, because the laboratory instrument used to measure
Z is designed to detect only a pathogen-specific immune response. For this class the estimand simplifies to
This simplification implies (
Z1,
Z0) are observed simultaneously for subjects assigned vaccine, greatly aiding identifiability (addressed briefly in Section 4). Moreover, this simplification is appealing for the parsimony of studying how clinical efficacy varies with a univariate (or low-dimensional
Z1) biomarker, and was an important motivator for us to use the
CEP estimand in our HIV vaccine efficacy trials research.
In addition, due to the lack of common support of the vaccine and placebo group biomarker distributions, treatment effects are undefined within all subgroups with observed biomarker
Z =
z for
z > 0, such that the
Prentice (1989) approach (Chan et al., 2002;
Gilbert and Hudgens, 2008) does not apply. The utility of the natural direct/indirect effect approach in this setting is also not clear (see Section 3.6). Therefore, in our motivating application principal stratification appears to yield the only well-defined estimand for assessing surrogate value. In other settings with
Z0 variable, the estimand (2) may still be useful, as the ability to predict
Y1 –
Y0 from
Z1 alone regardless of
Z0 would be useful for vaccine development (
Wolfson and Gilbert, 2010); generally what is sought is accurate prediction of
Y1 –
Y0 based on any baseline covariate information (i.e., actual baseline covariates and/or
Z1 and
Z0, which are treated as baseline covariates).
3.3. Utility of the specific surrogate estimand for selecting the primary biomarker endpoints in follow-up Phase I/II vaccine trials
For a given field of researchers working to develop a vaccine against a certain pathogen, a handful of pivotal vaccine efficacy trials are conducted over a period of decades, and the generated data are used to make decisions on the immune biomarkers to use as primary study endpoints in subsequent Phase I/II trials that evaluate and compare a number of refined candidate vaccine regimens. Typically no direct data on clinical efficacy in new settings are available for informing these decisions, such that the decisions are traditionally based on the primary efficacy data together with the observed associations between the immune biomarkers and the clinical outcome within the efficacy trials (i.e.,
Z1 and
Y1), informally combined with theories/models of mechanisms of protection. In particular, for many pathogens the first efficacy trial demonstrates partial vaccine efficacy that is too low to warrant licensure, which makes it a top priority to assess immune correlates for guiding the selection of immune biomarker endpoints in follow-up trials (e.g., such immune correlates assessment is now occurring for the first trial to show low-level efficacy of an HIV vaccine,
Rerks-Ngarm et al., 2009).
As an improvement to the traditional approach that selects biomarkers with the strongest (Z1,Y1) associations, we suggest utilizing the CEP(z1) curve to select biomarkers with the strongest (Z1,Y1 – Y0) associations. Biomarkers with CEP(z1) large for some range of z1 and CEP(z1) small for z1 equal to or near zero may be prioritized as primary endpoints. Given selection of the best biomarker, the follow-up Phase I/II trials would rank the vaccine regimens by the proportion of vaccine recipients with immune response z1 in the estimated high-protection range, forming the basis for advancing the most promising vaccine regimen to the next efficacy trial. Because accurate prediction of vaccine efficacy internal to an efficacy trial does not imply accurate prediction to a new setting where the clinical outcome is not measured, it is important to address if and how the CEP estimand may be useful for this purpose; we begin this discussion in Section 3.5.
3.4. Remarks on evaluating bridging surrogates
Based on the above discussion, theories of mechanisms of protection must be combined with an empirically supported specific surrogate to make a bridging prediction, and the accuracy of the prediction depends on the veracity of the theory. Building credible theories has often been more achievable in the preventive vaccine setting than for many chronic disease settings, because the biological pathways of treatment effects on
Z and
Y are often better understood and these pathways can be studied more readily in the lab, due to the specificity of the biomarker and of the infection endpoint. For example, commonly theories have proposed that functional antibodies (e.g., neutralizing) directed to certain pathogen epitopes are protective against infection, and manipulation experiments are conducted (e.g., antibody infusion challenge experiments in animals or humans) to provide evidence that the functional antibodies actually kill the pathogen before it can establish infection (
Plotkin, 2008). The nature of the bridge is fundamental to the needed theory. If the only change from the conditions of the efficacy trial is adding a fourth pathogen strain to the existing 3-strain vaccine (e.g., for influenza), then it may be relatively easy to develop a compelling biological theory justifying accurate bridging, whereas if a new vaccine formulation is tested in a new population against new circulating virus types, then the needed biological theory will be more elaborate, raising the bar for credibility.
Pearl and Bareinboim (2011) suggest it is useful to mathematically formalize the process by which evaluating a predictive biomarker in an experimental setting is combined with theory/assumptions to yield accurate bridging, a point we agree with. As noted above vaccine development over the past 60 years has proceeded by informally combining the two elements, which, while not ideal, has worked reasonably well, as judged by the fact that the identification and assessment of efficacy-predictive immune biomarkers has ubiquitously played a central role in the development and deployment of vaccines, many of which were confirmed over the course of decades to confer high levels of vaccine efficacy in many populations (
Falk and Ball, 2001;
Plotkin, 2010). However, the use of a formal mathematical framework for bridging may have allowed for even greater success, and future research in this area seems merited. Similarly, a formal framework is needed for understanding when the
CEP estimand provides reliable guidance for bridging. We sketch a start to this problem in the next section.
3.5. Toward criteria for reliable bridging based on the CEP estimand
Consider a new setting different from that in the efficacy trial, which may entail a new vaccine regimen, a new study population, or both. First consider the case of a new vaccine and the same study population. To illustrate the bridging problem (which is realistic for HIV vaccine efficacy trials), suppose the initial efficacy trial demonstrates partial vaccine efficacy that is promising but too low to warrant licensure, and also identifies a promising biomarker, with

near zero (i.e., supporting average causal necessity of a vaccine-induced immune response for protection,
Gilbert and Hudgens, 2008) and

increasing monotonically in
z1. These results stimulate research on various refined candidate vaccines, leading to the advancement of a promising new vaccine to a follow-up Phase II trial in the identical population that was studied in the efficacy trial (identical inclusion and exclusion criteria), which shows that the distribution of the immune biomarker is substantially shifted upwards compared to that for the previous vaccine. The field of vaccine researchers hopes that the new vaccine improves the overall vaccine efficacy.
Similar to
Peal and Bareinboim (2011), our formulation for evaluating bridging envisages two experiments, the original efficacy trial and the follow-up Phase II trial, and considers conditions for transportability. The Phase II trial randomizes subjects to the new vaccine or new placebo (
X = 1′ or
X = 0′), and uses an identical procedure for measuring the same biomarker as was used in the efficacy trial, yielding information on
Z. However, information on the outcome
Y is not collected and therefore interest focuses on predicting
CEnew
P(
Y0′ = 1) –
P(
Y1′ = 1), i.e., the overall effect on
Y of the new vaccine. The overall effect on
Y of the original vaccine can be expressed as
CE
P(
Y0 = 1) –
P(
Y1 = 1) = ∫
CEP(
z1)
dF(
z1) and the overall effect on
Y of the new vaccine can be expressed as
CEnew = ∫
CEPnew(
z1)
dF′(
z1), where
F is the cdf of
Z1,
F′ is the cdf of
Z1′, and
CEPnew(
z1)
P(
Y0′ = 1|
Z1′ =
z1) –
P(
Y1′ = 1|
Z1′ =
z1). Here for simplicity we focus on the common special case that
Z0 and
Z0′ are constant. The field of vaccine researchers receives reliable guidance about bridging efficacy if
CEnew can be accurately predicted, suggesting a general criterion for
Z to be a useful bridging surrogate:
[Bridging Surrogate Criterion.] Z is a useful bridging surrogate if CEnew can be accurately predicted based on CEP(z1) and F from the efficacy trial and F′ from the follow-up trial.
In particular, the field of vaccine researchers hopes for accurate prediction in the following way: if a new vaccine is selected for efficacy testing based on the criterion that Phase I/II trials demonstrate increases in the percentage of vaccine recipients with
z1 values in regions where

from the previous efficacy trial is high, then the selected vaccine is accurately predicted to have
CEnew >
CE. That is, successfully modifying the vaccine based on the biomarker reliably leads to a more efficacious vaccine.
Following
Gilbert and Hudgens (2008), if
Z1 and
Z1′ have the same support, then the prediction of
CEnew may be based on
where
ψ(
z1)
CEPnew(
z1)/
CEP(
z1) (with convention 0/0 = 1). This equation re-weights the original
CEP curve by two factors: the relationship between
CEPnew(
z1) and
CEP(
z1) for each value of
z1 and the distribution of
Z1′ for the new vaccine. A numerical prediction is obtained by substituting estimates for
CEP(·) and
F′(·) into (3) and by assuming a fully specified form for
ψ(·); therefore the prediction combines empirical evidence with a bridging assumption. A perfectly accurate prediction is obtained if
CEPnew(·) =
CEP(·), i.e.,
ψ(·) = 1. If this perfect bridging assumption holds, then
CEnew can be accurately predicted by
Expression (4) is similar in spirit to the “transport formula” of Pearl and Barenboim [2011,
equation (5)], except in (4) we are integrating over principal strata rather than observed biomarker levels. In words, the perfect bridging assumption
ψ(·) = 1 states that given a vaccine induces an immune response
z, the protective effect (on
Y) will be the same regardless of whether it was the new or original vaccine that induced the immune response, and regardless of any differences in the placebos used in the two studies. Note that the perfect bridging assumption
ψ(·) = 1 implies transportability of the average causal necessity condition from the efficacy trial to the new trial:
CEP(0) = 0 implies
CEPnew(0) = 0.
On a case-by-case basis, vaccine researchers must deliberate the plausibility of the perfect bridging equality. One way it would fail is if the additional group of individuals achieving an immune response in the high-protection range with the new vaccine differs in a critical way from the subgroup that achieved the high-protection range with the original vaccine in the efficacy trial; for example, the originally protected subgroup may have all possessed a critical (unmeasured) host genotype that is absent in the additional group.
If perfect bridging (
ψ(·) = 1) fails, imperfect but useful bridging may still be achieved, depending on the nature of the departure of
ψ(·) from unity. Even if
ψ(·) does not equal 1, (4) should provide a reasonable estimate of
CEnew provided
CEP(
z1) ≈
CEPnew(
z1) for
z1 where
dF′ (
z1) is large. While
ψ (·) is not identifiable without evaluating the new vaccine in an efficacy trial, a sensitivity analysis may be conducted where one considers how

changes with different assumed forms for
ψ(·). For example, if the cautious assumption is made that
ψ(·) = 1/2, is

still sufficiently large to justify moving forward with a new efficacy trial?
If a second efficacy trial is conducted with the new vaccine, then transportability is supported by a numerical prediction

[obtained from (4)] near the estimate of
CEnew obtained in the primary analysis of the new efficacy trial which ignores the biomarker data.
Gilbert and Hudgens (2008) note that even in the absence of a second efficacy trial, a partial check of transportability (or ‘projective validity’) can be conducted by cross-validation of data from the first efficacy trial. In particular, individuals in the trial can be partitioned into two subgroups. Then the
CEP curve can be estimated from subgroup 1 data and
CE can be predicted for subgroup 2 based on the observed distribution of
Z1 in subgroup 2 and the estimated
CEP curve from subgroup 1. Transportability across subgroups is supported if the predicted
CE in subgroup 2 is similar to the estimate of
CE in that subgroup which ignores data on
Z1.
Next we suppose the follow-up Phase I/II trial is done with a new vaccine in a new population. In this case the bridging criterion described above carries over under a slight modification that accounts for different distributions of baseline covariates
W in the two settings. Specifically, the
CEP estimand now conditions on
W,
CEP(
z1,
w) =
P(
Y0 = 1|
Z1 =
z1,
W =
w) –
P(
Y1 = 1|
Z1 =
z1,
W =
w), and the integrations in (3) and (4) are replaced with integrations over the joint distribution of
Z1 and
W, now requiring common support of this joint distribution for the old and new settings. A challenge with the bridging criterion is that the numerical prediction

may be inaccurate if the baseline covariates are inadequately informative about disease risk to fully adjust for differences in risk between the two settings. In addition, the bridging criterion relies on a particular functional contrast between the conditional disease risks under the two treatment assignments specified by the
CEP estimand; we have focused on a difference on the additive scale. These challenges may be especially problematic if the placebo group disease incidence differs substantially between the two settings. For scenarios where the support of the biomarker and/or the other covariates differs between the old and new settings, additional research is needed to delineate if and how the
CEP estimand may be useful for assessing bridging surrogate utility.
3.6. Natural versus principal strata direct/indirect effects
Pearl suggests that the PS direct effect (PSDE) estimand (
VanderWeele, 2008) is inadequate for measuring mediation of a treatment effect and is generally less interesting scientifically (especially for identifying and explaining causal mechanisms) than the natural direct effect (NDE) estimand of
Robins and Greenland (1992) and
Pearl (2001). In a sense, this comment does not apply to our specific surrogate estimand as we use it in vaccine efficacy trials– not for measuring anything about the causal biological mechanism of protection, but merely for measuring a biomarker’s predictiveness of vaccine efficacy. Therefore on the one hand we do not view the PS estimand
CEP [given either by (1) or (2)] as a competitor with other causal estimands trying to identify and explain causal mechanisms; it simply has a different purpose, prediction.
On the other hand, the
CEP estimand at
Z1 =
Z0 is the PSDE, which raises the question as to whether this estimand or the alternative NDE estimand is of greater value for the surrogate endpoint problem in vaccine efficacy trials. To address this, we first review the definition of the NDE estimand, which considers the potential clinical endpoint
Yx,
z under assignment to both
X =
x and
Z =
z, thus requiring that the biomarker
Z is manipulable. By consistency
Yx,
Zx =
Yx, i.e., the potential outcome when
X is set to
x and
Z is set to
Zx is the same as when
X is set to
x and
Z is not manipulated and therefore (naturally) takes on the value
Zx. The average NDE (
Pearl 2001,
equation 6) estimand can be defined by
i.e., the average effect of treatment when setting the intermediate
Z to the value it would have been with treatment (i.e., when
X = 1). This estimand is entirely symmetric such that a second average NDE estimand is defined as
i.e., the average effect of treatment when setting
Z to the value it would have been without treatment (i.e., when
X = 0). Thus in considering the NDE estimand for the vaccine setting, one needs to conceive of either placebo recipients having their immune responses set to
Z1, as in (5), or vaccine recipients having their immune responses set to
Z0, as in (6).
With that background, we are sympathetic to Pearl’s statement, “...it is hard to accept the PSDE restriction that nature’s pathways should depend on whether we have the technology to manipulate one variable or another,” but only for a certain category of manipulations. In particular, we distinguish between manipulations that may not be possible now but conceivably can be developed, versus manipulations that can never conceivably be developed. An example of the former is a controlled direct effect estimand (
Pearl, 2001) that sets all subjects to be fully compliant to the assigned inoculations; while this manipulation may be unachievable in an efficacy trial, once an excellent vaccine is licensed, many individuals will receive it, conceivably even those who would have been non-compliant in the efficacy trial (
Robins and Greenland, 1996). We suggest that for HIV vaccine trials (for which
Z0 = 0 for all subjects) both NDE estimands (5) and (6) are examples of the latter. First, estimand (5) requires that placebo recipients can conceivably have their HIV-specific immune response
Z set to exceed 0; however
Z is measured using an immunoassay that mixes certain HIV peptides/isolates with the individual’s blood sample, and, by the nature of the adaptive immune system, HIV antigenic exposure (created by HIV vaccination or natural exposure) is the only thing that could stimulate a positive HIV-specific response. Similarly, estimand (6) requires that all vaccine recipients can be manipulated to have
Z = 0, which is also difficult to conceive given the nature of the assay for measuring
Z. Many others have suggested that causal estimands requiring inconceivable manipulations are of dubious scientific value (e.g.,
Holland, 1986;
Angrist, Imbens, and Rubin, 1996).
VanderWeele (2008) wrote, “Whether it is reasonable to consider counterfactual variables of the form
Yxz will depend on whether an intervention on the intermediate variable is conceivable,” and “Principal strata direct and indirect effects have the advantage that the concepts are defined irrespective of whether an intervention on the intermediate variable is conceivable.” Specifically addressing the surrogate endpoint problem,
Gallop et al. (2009) and
Joffe and Greene (2009) made the same point.
However, it is not easy to definitively answer the question as to whether a conceivable manipulation exists; a negative answer produced by a feeble imagination could be reversed by a fertile one. For the vaccine example and NDE estimand (5), we can imagine manipulations to set
Z > 0 in placebo recipients. For instance, in passive immunization experiments, antibodies or T cells from another individual or stimulated in vitro may be transferred to an unvaccinated individual. However, such a manipulation poses another difficulty to using the NDE estimand: the consistency assumption (
Cole and Frangakis, 2009) becomes dubious. In particular, consistency implies that the outcome for an individual observed to have
Z1 =
z1 > 0 when vaccinated (
X = 1) would be the same as if we set
Z =
z1 through passive immunization.
Thus use of the natural direct/indirect approach may require strong assumptions about manipulation and consistency, a problem not faced by the PS estimand. We conclude there are unsolved challenges posed to use of the NDE estimand for our motivating class of efficacy trials with Z0 constant.