Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2837843

Formats

Article sections

- Summary
- 1. Introduction
- 2. The set-up
- 3. Identification
- 4. Estimation of m(z, X)
- 5. A simulation study
- 6. Example
- References

Authors

Related links

J R Stat Soc Series B Stat Methodol. Author manuscript; available in PMC 2010 March 13.

Published in final edited form as:

J R Stat Soc Series B Stat Methodol. 2007 November 1; 69(5): 879–901.

doi: 10.1111/j.1467-9868.2007.00615.xPMCID: PMC2837843

NIHMSID: NIHMS68127

Yannis Jemiai, Cytel Inc., Cambridge, USA;

See other articles in PMC that cite the published article.

We consider estimation, from a double-blind randomized trial, of treatment effect within levels of base-line covariates on an outcome that is measured after a post-treatment event **E** has occurred in the subpopulation _{E,E} that would experience event **E** regardless of treatment. Specifically, we consider estimation of the parameters γ indexing models for the outcome mean conditional on treatment and base-line covariates in the subpopulation ** _{E,E}**. Such parameters are not identified from randomized trial data but become identified if additionally it is assumed that the subpopulation

In this paper we consider the problem of estimating, from double-blind randomized trials, the effect of a treatment on an outcome that is measured after a certain post-treatment event **E** has occurred. Our work is motivated by the need to develop adequate methodology for addressing the important question in human immunodeficiency virus (HIV) vaccine research of whether exposure to an imperfect vaccine (i.e. one that can prevent infection in some but not all recipients) has an effect on the progression of disease after infection. In this context, the relevant question is whether exposure to vaccine prevents or delays the onset of acquired immune deficiency syndrome in those in which it fails to prevent infection. In particular, the goal is to compare post-infection outcomes in the subpopulation ** _{EE}** of individuals for whom, in a placebo-controlled vaccine trial, infection (the event

Counterfactual, also referred to as potential, outcomes (Rubin, 1978; Robins, 1986) are useful tools for constructing causal contrasts measuring treatment effects. Our problem is peculiar in that, although in principle it may be possible to conceptualize counterfactual outcomes for all subjects in the population, such outcomes do not hold any relevant meaning except in the subpopulation _{EE}. For example, counterfactual markers of disease progression after infection, such as viral load 3 months after infection, could in principle be defined for everyone (by envisioning that infection could somehow be forced by intervention in everyone). However, they are irrelevant variables except for the subpopulation in which infection cannot be prevented as this is the pertinent analysis subpopulation. Another setting in which a similar situation arises is when interest lies in causal contrasts that are defined in terms of outcomes that can be censored by death. For example, in a cancer trial, we may be interested in determining which of two competing chemotherapy treatments would result in better quality of life in those who would survive a fixed period of time after the end of treatment.

The importance of estimation of quantities that are defined only in the subpopulation in which outcomes hold meaning was first discussed in the context of outcomes that are censored by death by Kalbfleisch and Prentice (1980). As far as we know, Robins (1986), remark 12.2, is the earliest reference that considers inference about causal effects in the ** _{EE}**-subpopulation of subjects who would survive under either treatment. Robins (1986) considered time-to-event outcomes but its formulation applies likewise to outcomes that are measured at a prespecified time point. Later Robins (1995) provided a set of strong untestable assumptions under which the sharp null hypothesis of no causal effect in the

Because causal contrasts in the subpopulation _{E,E} are not identified even from randomized trial data, two approaches have been considered in the literature to address the non-identifiability problem. The first approach considers inference about sharp bounds for the causal contrasts (Hudgens *et al.*, 2003; Jemiai and Rotnitzky, 2003; Zhang and Rubin, 2003). Sharp bounds determine the range of all possible values of the causal contrast that are compatible with the observed data distribution. This approach has the advantage of providing a most objective assessment of the treatment effect in the light of the available information. However, it is uninformative about what conditions might give rise to each value in the range.

The second approach considers assumptions under which the causal contrasts are identified from randomized trial data. These include postulating values for easily interpretable features of the distribution of the counterfactuals. The causal contrasts are then estimated regarding the features’ value as fixed and known, and the estimation procedure is repeated under various such values within a plausible range in a form of sensitivity analysis (Gilbert *et al.*, 2003; Hayden *et al.*, 2005). In particular, Gilbert *et al.* (2003) described an estimator of the mean difference of the treatment-specific counterfactual outcomes in the subpopulation _{E,E} under the following identifying assumptions:

- the subpopulation
_{Ē,E}of subjects who would experience the event under the second treatment, say**B**, but not under the first treatment, say A, is empty (this assumption is often referred to as monotonicity; Angrist*et al.*(1996), Gilbert*et al.*(2003) and Zhang and Rubin (2003)) - the distributions of the treatment A counterfactual outcome in
_{E,E}and in the subpopulation_{E,Ē}of subjects that would experience the event under A but not under B are related by a specified tilting transformation, which is indexed by a function*g*.

Gilbert *et al.* (2003) recommended repeating the estimation each time regarding a different tilting transformation as known, as a form of sensitivity analysis.

One attractive feature of the approach of Gilbert *et al.* (2003) is that inference about treatment effects is conducted under a flexible semiparametric model that makes no distributional shape assumptions and, as such, is protected from the possibility of distributional shape misspecifications. However, Gilbert *et al.* (2003) did not consider the estimation of treatment effects conditional on high dimensional base-line covariates. Yet, these conditional effects are important to understand whether and how the effect of treatment varies across levels of base-line covariates. In this paper, we develop extensions of the semiparametric approach of Gilbert *et al.* (2003) that allow the estimation of treatment effects conditional on covariates. For this, we introduce a new model, the ** _{E,E}**–marginal structural mean model, which postulates that the conditional mean of the treatment-specific counterfactual outcome given base-line covariates

- (c)a parametric model for the conditional probability that a subject experiences the event
**E**if assigned to treatment A given that the subject would experience the event if assigned to treatment B, his or her outcome under B and his or her pretreatment covariates.

We derive a locally efficient estimator of γ in the semiparametric model that is defined by the _{E,E}–marginal structural mean model and assumptions (a)–(c), which is guaranteed to achieve the semiparametric variance bound whenever a working parametric model for the distribution of the treatment B counterfactual outcome given the covariates in the population ** _{E,E}** is correctly specified. This work is a sequel to Shepherd

Consider an experiment which randomizes *n* subjects, who are independently selected from a given population of interest, to one of two treatments. A vector *X _{i}* =

To define the treatment effect of interest and our model, we use counterfactual random variables (Neyman, 1990; Rubin, 1978; Robins, 1986). Specifically, for each *z* {0, 1}, define *S _{i}*(

$$({S}_{i}(z),{Y}_{i}(z))=({S}_{i},{Y}_{i})\text{if}\phantom{\rule{thinmathspace}{0ex}}{Z}_{i}=z.$$

(1)

We assume that the vectors *W _{i}* = (

$$Z\coprod (S(0),S(1),Y(1),Y(0))|X\text{with probability}\phantom{\rule{thinmathspace}{0ex}}1,$$

(2)

because counterfactual random variables are unobserved base-line characteristics of each individual. Here, *A* *B*|*C* indicates conditional independence of *A* and *B* given *C* (Dawid, 1979).

In the causal inference literature, a causal effect measure on the outcome of interest (say, viral load) is defined as some measure of discrepancy between the distributions of *Y*(0) and *Y*(1) in the target population. However, this measure is irrelevant when, as in our context, both *Y*(0) and *Y*(1) only hold meaning in the subpopulation ** _{E,E}** of subjects for whom

Randomization is insufficient to identify *m*(*z, X*) from the observed data *O _{i}, i* = 1,…,

Assumption 1(monotonicity). Pr{S(1) = 1,S(0) = 0|X}=0 with probability 1.

Assumption 2(positivity assumption 1). Pr{S(1) = 0,S(0) = 1|X}>0 with probability 1.

Assumption 3(positivity assumption 2). Pr{S(1) = 1,S(0) = 1|X}>0 with probability 1.

Assumption 4(pattern mixture assumption).

$$f\{Y(0)|S(1)=S(0)=1,X\}=\frac{f\{Y(0)|S(1)=0,\phantom{\rule{thinmathspace}{0ex}}S(0)=1,X\}}{c(X)}\phantom{\rule{thinmathspace}{0ex}}\text{exp}[g\{X,Y(0)\}]$$

(3)

where *g*(·,·) is a known function and, for each *X* = *x, c*(*x*) is a normalizing constant.

Assumption 1 implies that the subpopulation _{Ē,E} is empty. In contrast, assumptions 2 and 3 imply that _{E, Ē} and _{E,E} respectively are non-empty. In the context of placebo-controlled vaccine trials, _{Ē,E},_{E,Ē} and _{E,E} correspond to the subpopulations of *harmed, protected* and *always infected* individuals, as defined by Gilbert *et al.* (2003). Under randomization (2), assumption 1 implies the testable restriction Pr(*S* =1|*Z* = 0, *X*)≥ Pr(*S* = 1|*Z* = 1, *X*) In the analysis of vaccine trials it might be reasonable to assume *a priori* assumption 1 if the treatments being compared are vaccine (*z* = 1) and placebo (*z* = 0) However, this assumption would usually not be defensible if the treatments are two experimental vaccines. Under randomization (2) and assumption 1, assumption 2 is also testable as it implies the observed data law restriction Pr(*S* = 1|*Z* = 0, *X*)>Pr(*S* = 1|*Z* = 1, *X*) In assumption 4, the function *g* calibrates the discrepancy between the conditional distributions of the outcome under treatment *Z* = 0 given covariates *X* in the subpopulations _{E,Ē} and *P*_{E,E}. In particular *g* = 0 establishes that these distributions are identical. In the placebo-controlled vaccine trial setting and with viral load being the outcome, the assumption *g* = 0 would hold if, among subjects who are infected if they receive placebo, the covariates *X* are all the simultaneous predictors of

- the potential outcome after infection under placebo and
- the potential infection state under vaccine

Note also that assumption 4 carries the implicit assumption that, within levels of *X*, the support of *Y*(0) is the same in the subpopulations _{E,Ē} and _{E,E}. In the context of placebo-controlled vaccine trials, this equal support assumption establishes that, if there is an always infected subject who would attain a given viral load value following infection after receiving placebo, then there also is a protected individual who, under the same circumstances, would attain the same viral load value.

A straightforward application of Bayes rule shows that assumptions 2–4 are equivalent to the existence of a real-valued function r(·) such that, with ω(*u*) {1+exp(−*u*)}^{−1}

$$\text{Pr}\phantom{\rule{thinmathspace}{0ex}}\{S(1)=1|S(0)=1,Y(0),\phantom{\rule{thinmathspace}{0ex}}X\}=\omega [r(X)+g\{Y(0),X\}].$$

(4)

We let (*g*) denote the model for the distribution of *W* defined by randomization (2) and assumptions 1–4. Shepherd *et al.* (2006) noted that *m*(*z, X*) is identified under condition (1) and (*g*) and satisfies

$$\begin{array}{c}\hfill m(1,X)=E(Y|S=1,Z=1,X),\hfill \\ \hfill m(0,X)=\frac{E[\omega \{r(X)+g(Y,\phantom{\rule{thinmathspace}{0ex}}X)\}Y|S=1,\phantom{\rule{thinmathspace}{0ex}}Z=0,X]}{E[\omega \{r(X)+g(Y,\phantom{\rule{thinmathspace}{0ex}}X)\}|S=1,\phantom{\rule{thinmathspace}{0ex}}Z=0,X]}\hfill \end{array}$$

(5)

where, for each *x, r*(*x*) is the unique solution of the equation

$$E[\omega \{r(X)+g(Y,X)\}|S=1,Z=0,X=x]=\frac{\text{Pr}(S=1|Z=1,\phantom{\rule{thinmathspace}{0ex}}X=x)}{\text{Pr}(S=1|Z=1,\phantom{\rule{thinmathspace}{0ex}}X=x)}.$$

(6)

Theorem 1 below establishes that, under condition (1), model (*g*) imposes on the distribution of the observed data the unique restriction

$$\text{Pr}(S=1|Z=0,X)>\text{Pr}(S=1|Z=1,X)>0\text{with probability}\phantom{\rule{thinmathspace}{0ex}}1.$$

(7)

An important consequence is that, for any *g* and *g*′ models (*g*) and (*g*′) impose the same restrictions on the observed data distribution. Thus, the observed data cannot help to discriminate *g* from*g*′, i.e. *g* is not identified. In settings where it can be *a priori* assumed that assumptions 1–3 hold, these results provide the mathematical justification for an analytical strategy in which inference about *E*{*Y*(*z*)|*X, S*(1) = *S*(0) = 1} is drawn repeatedly under model (*g*), each time regarding a different function *g* as fixed and known in a form of sensitivity analysis. Expression (4) facilitates the interpretation of the function *g* and thus may help the investigator to determine its plausible range in each particular application. For example, in the HIV vaccine trial setting, identity (4) implies that, for any fixed *x*, the function *g*(*x*,·)determines whether and how the rate of infection under vaccine, among subjects who would also be infected under placebo and have covariate values *x*, changes with viral load level under placebo. Interestingly, model (*g*) is like the biased sampling model of Bickel *et al.* (1993), page 113, when their selection probabilities are all equal except that, in their model, the stratum weights ω are known, whereas, under model (*g*) they depend on the unknown function *r*(·) and must be estimated.

Suppose that condition (1) holds. Then, for any function *g*, the observed data laws that are allowed by model (*g*) are those that satisfy restriction (7).

(identification in the absence of covariates). Suppose that assumptions 1–4 are restated without conditioning on *X*. Then theorem 1 remains valid when *X* is removed from all conditional statements and *r*(·) is replaced by a constant α. Indeed, without covariates, conditions 1 and 3 and equation (4) are precisely the assumptions that were made by Gilbert *et al.* (2003), to identify *m*(*z*). However, without explicitly stating it, they actually conducted maximum likelihood (ML) estimation of *m*(*z*) under a model that allowed for the possibility that α could be ∞, i.e. that assumption 2 did not hold. Although Gilbert *et al.* (2003) considered only functions *g*(*y*) = β*y* for some user-specified β, their ML estimator equally applies to any arbitrary user-specified function *g*(·)

When *X* has a finite discrete sample space, a consistent estimator of *m*(*z, X*) under model (*g*) may be obtained by using the estimator of *m*(*z, x*) that was given by Gilbert *et al.* (2003), within each level *x* of *X*. However, if *X* is continuous and/or high dimensional, these estimators are infeasible, because the data are too sparse to conduct stratified estimation. Furthermore, estimating *m*(*z, x*) by using smoothing methods will not be useful in practice because of the curse of dimensionality. For this reason, in this paper we consider estimation of *m*(*z, X*) by assuming that

$$m(z,X)=m(z,X;{\gamma}^{*}),\text{}z\in \{0,1\},$$

(8)

where, for each *z* and *x, m*(*z, x*; γ) is a smooth function of a *q* × 1 vector γ and its true value γ* is unknown. We call expression (8) a _{E,E}−marginal structural mean model, because it is a model for the marginal mean of a counterfactual random variable in the subpopulation _{E,E}. In addition, we denote by (*g*) the model that is defined like (*g*) but with the additional restriction (8).

Although γ* is identified under model (*g*), its estimation remains infeasible when *X* is high dimensional owing to the curse of dimensionality. Specifically, as implied by theorem 2 below, influence functions of regular asymptotically linear (RAL) estimators of γ* in model (*g*) typically depend on the unknown function *r*(·) that is defined by equation (6) and do not have mean 0 when evaluated at a misspecified function *r*(·) (The only exceptions are models (*g*) where *m*(*z* = 1, *X*) is determined by the parameters indexing the model for *m*(*z* = 0, *X*), but these are of little or no interest in studies that are aimed at investigating treatment effects, because they *a priori* assume the relationship between *m*(*z* = 1, *X*) and *m*(*z* = 0, *X*).) The implication is that estimation of γ* under model (*g*) would require the preliminary consistent estimation of the function *r*(·) by using smoothing techniques. However, when *X* is a vector with two or more continuous components, this function would not be well estimated given the moderate sample sizes that are found in practice, essentially because no two subjects would have values of *X* that are sufficiently near to allow the borrowing of information that is necessary for smoothing. Thus, in practice, we are forced to conduct inference under a reduced model for *r*(·) In this paper, we consider inference under the assumption that *r*(·) follows a parametric model

$$r(x)=r(x;{\alpha}^{*})$$

(9)

where, for each *x, r*(*x*;α) is a smooth function of a *P* × 1 vector α and its true value α* is unknown. Denote by (*g*) the model that is defined like (*g*) but with the additional restriction (9). The following theorem establishes the restrictions that are placed on the observed data law by (*g*). We subsequently discuss its implications for inference about γ*.

Under condition (1), the observed data laws that are allowed by (*g*) are those satisfying the restrictions *E*{*q*_{1}(*O*;α*, γ*)|*X*} = 0 (restriction 1) and *E*[*S* ω{*r*(*X*;α*) + *g*(*Y, X*)}^{1−Z}|*Z,X*] (restriction 2) does not depend on *Z*, where

$${q}_{1}(O;\alpha ,\gamma )\equiv \left(\begin{array}{c}SZ\{Y-m(Z,X;\gamma )\}\\ S(1-Z)\omega \{r(X;\alpha )+g(Y,X)\}\{Y-m(Z,X;\gamma )\}\end{array}\right).$$

Consider model , which is defined like model (*g*), except that *g* is assumed to follow a parametric model that is indexed by an unknown parameter δ. It is easily shown that theorem 2 remains valid if (*g*), *g*(*X, Y*) and *q*_{1}(*O*;α, γ) are replaced by , *g*(*X, Y*; δ) and *q*_{1}(*O*;α, γ, δ) respectively. This, in turn, implies that, if the dimensions of (δ,α, γ) are not too high, then δ will be identified. Consequently, one could estimate δ consistently. However, since δ is only identified because of models that are imposed to alleviate the curse of dimensionality, we take the philosophical stance that model should not be considered for inference. Rather, we should adopt model (*g*) (or, better yet, model (*g*) which is defined below) and conduct a sensitivity analysis over various choices of *g* (for more discussion on this point, see Scharfstein *et al.* (1999)).

Theorem 2 has the following consequences for inference about (γ*,α*). If restriction 1 were the only restriction defining the model, then results from Chamberlain (1987) concerning models that are defined by zero conditional mean restrictions would imply that RAL estimators of (α*, γ*) have influence functions of the form *d*_{1}(*X*) *q*_{1}(*O*;α*, γ*) for some (*q* + *p*) × 2 function *d*_{1}(*X*). Likewise, if restriction 2 were the only restriction defining the model, then results in Chamberlain (1987) about models that are defined by conditional mean independence restrictions would imply that, if an RAL estimator of α* existed, its influence function would have to equal *d*_{2}(*X*) *q*_{2}(*O*;α*) for some *p* × 1 function *d*_{2}(*X*), where

$${q}_{2}(O;\alpha )\equiv (S\phantom{\rule{thinmathspace}{0ex}}\omega {\{r(X;\alpha )+g(Y,X)\}}^{1-Z}-E[S\phantom{\rule{thinmathspace}{0ex}}\omega {\{r(X;\alpha )+g(Y,X)\}}^{1-Z}|X])\{Z-E(Z|X)\}.$$

(Note that, if restriction 2 were the only restriction defining the model, then γ* would not be defined.) In the light of this discussion, we would expect that RAL estimators of (α*, γ*) in model (*g*) would have influence functions of the form *d*_{1}(*X*) *q*_{1}(*O*;α*, γ*) + *d*_{2}(*X*) *q*_{2}(*O*;α*) for some (*q* + *p*) × 2 functions *d*_{1}(*X*) and *d*_{2}(*X*) This is indeed so, as stated below in part (c) of theorem 3. However, results in Klaassen (1987) indicate that influence functions of RAL estimators must be estimable consistently. Since *q*_{2}(*O*;α*) depends on the unknown functions Pr(*Z* = 1|*X*) and *E*[*S* ω{*r*(*X*;α*)+*g*(*Y*)}^{1−Z} | *X*] and these are not estimable consistently without further assumptions, we conclude that RAL estimators in model (*g*) have influence functions of the form *d*_{1}(*X*) *q*_{1}(*O*;α*, γ*), i.e. with *d*_{2}(*X*) = 0. In essence, this result implies that under model (*g*) we shall never be able to exploit restriction 2 for estimation of α*. This in turn implies that, if the dimension of α* is large, then RAL estimators of α* and γ* in model (*g*) will generally have large sampling variability or may not even exist because α* and γ* will be weakly identified or may even be non-identified by restriction 1. To see this consider the following extreme situation. Suppose that *X* is discrete with, say, *k* levels, and we pose models for *m*(*z* = 1, *X*) and *m*(*z* = 0, *X*) that are indexed by variation-independent parameters γ_{0} and γ_{1}, and a fully saturated model for *r*(·) Then, restriction 1 determines a set of *k* population equations, which are insufficient to identify the *k* components of α and the additional parameter γ_{0}.

Fortunately, in the context of randomized studies, the difficulty can be remedied, because the relevant model under which to conduct inference is model (*g*), which is defined like (*g*), but with the additional restriction that the randomization probabilities are known, i.e. Pr(*Z* = 1|*X*) is a known function of *X*.

Part (a) of theorem 3 below establishes that RAL estimators of (α*, γ*) under model (*g*) have influence functions that can be written as *h*(*O*; *d**,α*, γ*) *d**(*X*) *q*(*O*;α*, γ*) for some (*q* + *p*) × 4 vector function *d**(*X*), where *q*(*O*;α, γ) (*q*_{1}(*O*;α, γ)′, *q*_{2}(*O*;α, γ), *Z*−*E*(*Z*|*X*))′ for any (α, γ) Subsequently, in remark 3, we explain why the functional form of *h*(*O*; *d**,α*, γ*) implies that estimators of (α, γ) that exploit restriction 2 can be constructed under model (*g*) Define

$${d}_{\text{eff}}(X)=E{\left\{\frac{\partial}{\partial (\alpha \prime ,\lambda \prime )}q(O;\alpha ,\gamma ){|}_{(\alpha ,\gamma )=({\alpha}^{*},{\gamma}^{*})}|X\right\}}^{\prime}\text{var}{\{q(O;{\alpha}^{*},{\gamma}^{*})|X\}}^{-1}.$$

(10)

Throughout, for any pair of matrices *T*_{1} and *T*_{2}, *T*_{1} ≤ *T*_{2} means that *T*_{2} − *T*_{1} is positive semi-definite and
${T}_{1}^{\otimes 2}\text{denotes}\phantom{\rule{thinmathspace}{0ex}}{T}_{1}{T}_{1}^{T}$.

- If (, γ ^) is an RAL estimator of (α, γ) in model (
*g*), then there is a (*q*+*p*) × 4 vector function*d*(*X*) such that (, ) has influence function*h*(*O*;*d**,α*, γ*), where*d**(*X*) = [*E*{*h*(*O*;*d*,α*, γ*)*h*(*O*;*d*_{eff}, α*, γ*)^{T}}]^{−1}*d*(*X*) Equivalently,$$\begin{array}{c}{n}^{1/2}\left\{\left(\begin{array}{c}\hfill \widehat{\alpha}\hfill \\ \hfill \widehat{\gamma}\hfill \end{array}\right)-\left(\begin{array}{c}\hfill {\alpha}^{*}\hfill \\ \hfill {\gamma}^{*}\hfill \end{array}\right)\right\}\overrightarrow{n\to \infty}\phantom{\rule{thinmathspace}{0ex}}N\phantom{\rule{thinmathspace}{0ex}}(0,\sum )\text{where}\phantom{\rule{thinmathspace}{0ex}}\sum \equiv E\{h{(O;{d}^{*},{\alpha}^{*},{\gamma}^{*})}^{\otimes 2}\}.\end{array}$$ - The function
*h*(*O*; ${d}_{\text{eff}}^{*}$,α*,γ*)is the efficient influence function for RAL estimators of (α, γ) in model (*g*). Thus, for any*h*(*O*;*d**,α*, γ*)$$E\{h{(O;{d}_{\text{eff}}^{*},{\alpha}^{*},{\gamma}^{*})}^{\otimes 2}\}\le E\{h{(O;{d}^{*},{\alpha}^{*},{\gamma}^{*})}^{\otimes 2}\}.$$ - If (,) is an RAL estimator of (α, γ) in model (
*g*) then there is a (*q*+*p*) × 4 function*d*(*X*) with fourth column equal to**0**such that (,)has influence function*h*(*O*;*d**,α*, γ*)

Consider the estimating equation

$$\sum _{i}d({X}_{i})\phantom{\rule{thinmathspace}{0ex}}q({O}_{i};\alpha ,\gamma )=0$$

(11)

where *d*(*X*) is an arbitrary (*q* + *p*) × 4 vector function. Using standard Taylor series expansion arguments, it can be shown that if equation (11) has a solution (,) that is RAL then the influence function of (,) must be equal to *h*(*O*; *d**,α*, γ*) with *d** defined in theorem 3. Thus, part (a) of theorem 3 implies that if RAL estimators (, ) of (α*, γ*) in model (*g*) exist we can indeed obtain them all (up to asymptotic equivalence) by solving estimating equations of the form (11) for adequate choices of *d*(*X*) In addition, a consistent estimator of the asymptotic variance Σ can be constructed as *n*^{−1}Σ_{i}*h*(*O _{i}*;

$${\widehat{d}}^{*}(X)={\{{n}^{-1}{\displaystyle \sum _{i}d({X}_{i})\phantom{\rule{thinmathspace}{0ex}}\partial q(O;\alpha ,\gamma )/\partial (\alpha \prime ,\gamma \prime ){|}_{(\alpha ,\gamma )=(\widehat{\alpha},\widehat{\gamma})}}\}}^{-1}d(X).$$

We noted earlier that under model (*g*) restriction 2 of theorem 2 cannot be effectively exploited for estimation of (α*, γ*) because *q*_{2}(*O*;α) depends on the unknowns Pr(*Z* = 1|*X*) and *E*[*S* ω{*r*(*X*;α*) + *g*(*Y*)}^{1−Z} | *X*]. In contrast, under (*g*), we can use estimating equations with summands *d*_{2} (*X _{i}*)

$$\sum _{i}{d}_{2}({X}_{i}){S}_{i}\phantom{\rule{thinmathspace}{0ex}}\omega {\{r({X}_{i};\alpha )+g({Y}_{i})\}}^{1-Zi}\phantom{\rule{thinmathspace}{0ex}}\{{Z}_{i}-E(Z|{X}_{i})\}=0$$

that is obtained after setting *d*_{3}(*X*) equal to *E*[*S* ω{*r*(*X*;α*) + *g*(*Y*)}^{1−Z}|*X*] has this form and does not depend on unknown nuisance functions because, under model (*g*), *E*(*Z*|*X*) is known.

The estimating equations (11) may fail to have a solution. This is best understood in the absence of covariates. Then, *r*(*x*;α)=α and γ=(γ_{0}, γ_{1}) with
${\gamma}_{0}^{*}=m(z=0)\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\gamma}_{1}^{*}=m(z=1)$ Suppose that *g*(*Y*) = β*Y* and *d* = *d*(*X*) is the constant 4 × 3 matrix with last row equal to **0** and first three rows defining the 3 × 3 identity matrix. Then, solving the third equation in equations (11) is equivalent to solving

$${\{{\displaystyle \sum _{i}{S}_{i}(1-{Z}_{i})}\}}^{-1}\phantom{\rule{thinmathspace}{0ex}}{\displaystyle \sum _{i}{S}_{i}(1-{Z}_{i})}\phantom{\rule{thinmathspace}{0ex}}\omega (\alpha +\beta {Y}_{i})={\widehat{p}}_{1}/{\widehat{p}}_{0},$$

where

$${\widehat{p}}_{j}\equiv {\displaystyle \sum _{i}{S}_{i}\phantom{\rule{thinmathspace}{0ex}}I({Z}_{i}=j)/{\displaystyle \sum _{i}I({Z}_{i}=j)}}$$

is an unbiased estimator of Pr{*S*(*j*) = 1}. The resulting estimator of α is precisely the estimator of Gilbert *et al.* (2003). Since ω(α + β*Y _{i}*) is between 0 and 1, the left-hand side of this equation, being an average of values of ω, is also between 0 and 1 for any value of α. However, although Pr{

Part (b) of theorem 3 implies that solving equation (11) with *d*_{eff} (*X*) instead of *d*(*X*) yields a globally efficient estimator of (α*, γ*) in model (*g*), i.e. an estimator whose limiting distribution, after centring at (α*, γ*) and standardizing by √*n*, is normal with mean 0 and variance equal to *E*{*h*(*O*;
${d}_{\text{eff}}^{*}$,α*, γ*)^{2} under any law that is allowed by (*g*). Unfortunately, such an estimator is infeasible, because *d*_{eff} (*X*) depends on components of the unknown observed data distribution *F*_{O}. However, we can obtain an estimator (_{loc,eff}, _{loc,eff}) that is locally semipara-metric efficient in model (*g*) at a ‘working’ parametric submodel _{work}(*g*), i.e. an estimator that is consistent and asymptotically normal under model (*g*) with asymptotic variance equal to the semiparametric variance bound under any law that is allowed by _{work}(*g*). To construct such an estimator, we can proceed as follows.

Step 1: specify a working parametric submodel_{work}(g) that is indexed by (α, γ) and finite dimensional nuisance parameters, say η. This model determines a parametric modelF_{O}(·;α, γ, η) for the distributionF_{O}(·)of the observed dataO.

Step 2: compute the ML estimator (^{−},^{−},^{−})of (α, γ, η) under model_{work}(g).

Step 3: computed^_{eff}(X), which is defined asd_{eff}(X) in equation (10), but with expectations calculated under the lawF_{O}(·;^{−},^{−},^{−}):

Step 4: solve equation (11) by usingd^_{eff}(X) ford(X).

It is standard to show (see, for example, Robins *et al.* (1992)) that, under regularity conditions, the estimating equations that are solved in step 4 have a solution (_{loc,eff}, _{loc,eff}) that is locally semiparametric efficient in model (*g*), at the submodel _{work}(*g*). We can ensure that the working parametric model is indeed a submodel of (*g*) by arguing as follows. Suppose that we postulate parametric models that are indexed by η_{1} and η_{0} for the conditional distributions of *e*_{1} *Y*(1) − *m*(*z* = 1, *X*) and *e*_{0} *Y*(0) − *m*(*z* = 0,*X*) given *X, S*(0) = 1 and *S*(1) = 1 respectively, and a working parametric submodel Pr{*S*(1) = 1|*X*; η_{2}} that is indexed by η_{2} for Pr{*S*(1) = 1|*X*}. These submodels are necessarily compatible with model (*g*) because model (*g*) does not impose restrictions on the conditional distributions of *e*_{1} and *e*_{0} given *X, S*(0) = 1 and *S*(1) = 1 nor on Pr{*S*(1) = 1|*X*}. To construct the likelihood *L*_{n}(α, γ, η) under such a model, we marginalize the joint distribution of the counterfactuals over the missing information as in Frangakis and Rubin (2002). We effectively implement this marginalization by first deriving the restrictions that are implied by our model for the counterfactuals on the observed data distribution and then constructing the likelihood based on the derived model for the observed data. Specifically, under assumption (1), the submodels for the conditional distributions of *e*_{1} and *e*_{0}, together with the _{E,E} marginal mean model, determine fully parametric models *f*(*Y*|*S* = 1, *Z* = 1, *X*; γ, η_{1}) for the distribution *f*(*Y*|*S* = 1,*Z* = 1, *X*), and *f** (*Y*|*X*; γ, η_{0}) for the distribution

$${f}^{*}(Y|X)\equiv \frac{\omega \{r(X;{\alpha}^{*})+g(X,Y)\}\phantom{\rule{thinmathspace}{0ex}}f(Y|S=1,\phantom{\rule{thinmathspace}{0ex}}Z=0,\phantom{\rule{thinmathspace}{0ex}}X)}{E[\omega \{r(X;{\alpha}^{*})+g(X,Y)\}\phantom{\rule{thinmathspace}{0ex}}|S=1,\phantom{\rule{thinmathspace}{0ex}}Z=0,\phantom{\rule{thinmathspace}{0ex}}X]}.$$

(12)

In turn, model *f** (*Y*|*X*; γ, η_{0}) determines a parametric model *f*(*Y*|*S* = 1,*Z* = 0,*X*; γ,α, η_{0}) for *f*(*Y*|*S* = 1,*Z* = 0, *X*). Also, restriction 2 of theorem 2 is equivalent to the restriction

$$\text{Pr}(S=1|Z=1,\phantom{\rule{thinmathspace}{0ex}}X)=E[\omega \{r(X;{\alpha}^{*})+g(X,Y)\}|S=1,Z=0,X]\phantom{\rule{thinmathspace}{0ex}}\text{Pr}(S=1|Z=0,X).$$

(13)

Thus, under condition (1), the model Pr{*S*(1) = 1|*X*; η_{2}} for Pr{*S*(1) = 1|*X*} determines a parametric model for Pr(*S* = 1|*Z* = 1, *X*) that is indexed by η_{2} and a parametric model for Pr(*S* = 1|*Z* = 0, *X*) that is indexed by (γ,α, η_{0}, η_{2}). Under the working model, the likelihood _{n}(α, γ, η) is equal to

$$\prod _{i=1}^{n}f{\{{\epsilon}_{0,i}(\gamma )|{S}_{i}=1,{Z}_{i}=0,{X}_{i};{\eta}_{0}\}}^{{S}_{i}(1-{Z}_{i})}f{\{{\epsilon}_{1,i}(\gamma )|{S}_{i}=1,{Z}_{i}=1,{X}_{i};{\eta}_{1}\}}^{{S}_{i}{Z}_{i}}\phantom{\rule{thinmathspace}{0ex}}{Q}_{i}(\alpha ,\gamma ,{\eta}_{0},{\eta}_{2})$$

where η = (η_{0}, η_{1}, η_{2}),

$${Q}_{i}(\alpha ,\gamma ,{\eta}_{0},{\eta}_{2})={\{{\mathrm{\Gamma}}_{i}({\eta}_{2})\phantom{\rule{thinmathspace}{0ex}}{R}_{i}{(\alpha ,\gamma ,{\eta}_{0})}^{{Z}_{i}-1}\}}^{{S}_{i}}{\{1-{\mathrm{\Gamma}}_{i}({\eta}_{2})\phantom{\rule{thinmathspace}{0ex}}{R}_{i}{(\alpha ,\gamma ,{\eta}_{0})}^{{Z}_{i}-1}\}}^{1-{S}_{i}},$$

Γ_{i}(η_{2}) = *P*(*S _{i}* = 1|

$${R}_{i}(\alpha ,\gamma ,{\eta}_{0})=E[\omega \{X;\alpha )+g(X,Y)\}|{S}_{i}=1,{Z}_{i}=0,X;\gamma ,{\eta}_{0}].$$

In steps 2 and 3 we can replace the ML estimator (^{−}, ^{−}, ^{−}) with the easier-to-compute estimator (^{−}, ^{−}, ^{−}) where (^{−}, ^{−}) solves equation (11) for an arbitrary choice of *d*(*X*) and ^{−} is the value of η that maximizes _{n}(^{−}, ^{−}, η). The output of the thus-modified algorithm still returns a locally efficient estimator of (α, γ).

The ML estimator (^{−}, ^{−}) is one of the estimators that was proposed by Shepherd *et al.* (2005). Note, however, that in contrast with (^{−}, ^{−}) our proposed estimator (;_{loc,eff}, _{loc,eff}) is consistent, regardless of whether or not the models for *f* {ε_{1}|*S*(0) = 1, *S*(1) = 1, *X*}, *f* {ε_{0}|*S*(0) = 1, *S*(1)=1,*X*} and Pr(*S* = 1|*Z* = 0, *X*) are correctly specified.

A simulation study was carried out to evaluate the finite sample properties of our proposed estimator (_{loc,eff}, _{loc,eff}). We conducted seven experiments, each consisting of 1000 repetitions so that 95% confidence intervals were expected to cover the true parameters with roughly a 1:3% margin of error. We generated independent vectors *W _{i}* = (

$$\begin{array}{c}\hfill r(X;\alpha )={\alpha}_{0}+{\alpha}_{1}X,\hfill \\ \hfill m(z;X;\gamma )={\gamma}_{0}+{\gamma}_{1}X+{\gamma}_{2}z+{\gamma}_{3}zX,\hfill \\ \hfill g(X,Y)=\beta Y\hfill \end{array}\}$$

(14)

where the values of
$({\alpha}_{0}^{*},{\alpha}_{1}^{*})({\gamma}_{0}^{*}{\gamma}_{1}^{*})$
and β are given below, and
${\gamma}_{2}^{*}={\gamma}_{3}^{*}=0$
This set-up implies that treatment *Z* has indeed no effect on the conditional mean of the outcome in the subpopulation _{E,E}. In addition, under our data-generating process the distributions of *Y*(0) and *Y*(1) given *X* in the subpopulation _{E,E} are the same and equal to *N*{*m*(*z,X*; γ*), 1}. The distribution of *Y*(0) in the subpopulation _{E,Ē} is normal with mean *m*(*z,X*; γ*) − β and variance 1. Our parameter values were chosen so that roughly 70% of the subjects with *S*(0) = 1 and *X* = 38 (with 38 being the mean of *X*, as indicated next) would also have *S*(1) = 1, i.e. so that Pr{*S*(1) = 1|*S*(0) =1, *X* = 38}≈0:7.

For each *i* = 1,…, *n, Z _{i}, X_{i}* and

Table 1 and Table 2 report the results for inference about the parameters γ = (γ_{0}, γ_{1}, γ_{2}, γ_{3}) and α = (α_{0},α_{1}) respectively of the models (14) for *n* = 20000 and *n* = 2000. Each table reports the Monte Carlo mean (labelled ‘Mean’) and median (labelled ‘Median’) of the estimators, the Monte Carlo coverage probability of nominal 95% Wald confidence intervals (labelled CP) and their median length (labelled ‘Length’). Table 1 and Table 2 report results for the following estimators: the naïve, OLS, estimator of *Y* on *Z* and *X* based on observations with *S* = 1 (labelled OLS), the inefficient estimator of (α, γ) (labelled INE) solving equation (11) that uses

$$d(X)\prime =\left(\begin{array}{cccccc}\hfill 1\hfill & \hfill X\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill X\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill X\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill b(X)\hfill & \hfill X\phantom{\rule{thinmathspace}{0ex}}b(X)\hfill \end{array}\right),$$

(15)

where *b*(*X*) was chosen so that

$${q}_{2}(O;\alpha )+b(X)\{Z-E(Z|X)\}=S\omega {\{r(X;\alpha )+g(Y)\}}^{1-Z}\{Z-E(Z|X)\},$$

and the locally efficient estimator under (correctly specified) working models that assume a logistic model for Pr(*S* = 1|*Z* = 0, *X*) and normal distributions with variance equal to 1 for *f* *(*Y*|*X*) and *f*(*Y*|*S* = 1,*Z* = 1) (labelled EFF). All estimators were computed under the true value of β. The starting values of γ and α for the algorithm solving the inefficient estimating equation were set at the OLS estimator of γ and at α = (0, 0). The resulting estimates were used as starting values for the algorithm solving the locally efficient estimating equation. When *n* = 2000, the inefficient estimation algorithm did not converge in 0.4–14.4% of runs, depending on the value of β (more frequently for the larger values of β), because the algorithm failed to find a root of the estimating equation. These runs were discarded and replaced with new runs. In each of the faulty runs we examined whether the algorithm solving the locally efficient estimating equation converged when started at values of γ and α near the true values. In all the runs that were investigated, the algorithm converged. We therefore attribute the lack of convergence of the algorithm solving the inefficient equation to the poor choice of function *d*(*X*). We do not regard this failure as serious, since, faced with it, an investigator would try different *d*(*X*) and different starting values until the algorithm converged.

Mean and median of (_{0}, _{1}, _{2}, _{3}), coverage probability CP and median length of the associated 95% confidence intervals in two 1000-run simulation studies with randomization probability **...**

Mean and median of (_{0}, _{1}), coverage probability CP and median length of the associated 95% confidence interval in two 1000-run simulation studies with randomization probability *P*(*Z* = 1) = 0.5, probability of the event **...**

Our results for *n* = 20000 confirm that the properties that are established by theorem 3 hold. The naïve OLS estimator was biased, more so as the value of β departed from 0. When β = 3, this bias was sufficiently severe to reverse the sign of _{0}. OLS estimators of the mean shift parameter γ_{2} were significantly far from 0 even when β = 1. Also, as predicted by the theory, both the efficient and the inefficient estimators were unbiased. Coverage probabilities of the 95% confidence intervals were close to the nominal value, and efficiency, as measured by the median interval length, was somewhat better when the locally efficient estimator was used to centre the intervals, the gains in efficiency being more pronounced for estimation of α. Curiously, intervals that are centred at the inefficient estimator had poor coverage when β = 3 whereas the coverage was substantially improved when the intervals were centred at the efficient estimator. No significant gains in efficiency were obtained from the locally efficient estimator when β was 0.1. This came as no surprise since this value, accounting for the variance in *Y*, is very close to 0. At β = 0, it can be easily shown that *d*_{eff} (*X*) = *d*(*X*) given in equation (15), and ω does not depend on *Y*, thus resulting in algebraic identity between the locally efficient and inefficient estimators. Results for *n* = 2000 were qualitatively similar to those for *n* = 20000, except that for β = 3 both the inefficient and the efficient estimators were biased and the coverage probability of the interval was poor. This demonstrates that the asymptotic distribution (,) is not a good approximation to the finite sample distribution of (,) when β is large even if *n* is moderate. The extra experiment with β = 2 is meant to show that the asymptotic distribution is still a good approximation when *n* = 2000 at this value of β.

In this section, we apply our methods to data from a randomized, double-blind, placebo-controlled phase III trial, which was conducted by VaxGen between 1998 and 2003. This trial sought to evaluate the performance of the vaccine AIDSVAX B/B among a total of 5403 HIV negative at-risk individuals. The ratio of vaccine to placebo assignment was 2:1. Overall, the vaccine was found to be ineffective in preventing HIV infection. Exploratory subgroup analyses, however, suggested the possibility that the vaccine partially prevented infection in non-white and high risk subjects (rgp120 HIV Vaccine Study Group, 2005).

In our analysis we eliminated 21 subjects who did not enrol in the post-infection phase of the study, and three subjects who started antiretroviral therapy before assessment of their HIV viral load. After doing so we were left with 223 infected out of 3823 subjects in the vaccine arm and 121 out of 1920 in the placebo arm. Our analysis is for illustrative purposes only. An adequate analysis would also adjust for potential selection bias that is induced by our censoring rule.

A Wald test of the hypothesis *H*_{0}: Pr(*S* = 1|*Z* = 1) = Pr(*S* = 1|*Z* = 0) had a *p*-value of 0.244, thus raising doubts about the validity of assumptions 1 and 2, and hence the applicability of our methods to the entire study population. However, as previously mentioned, possible partial vaccine efficacy was observed in non-white subjects, suggesting that our assumptions 1–4 are reasonable for this subgroup. Indeed, this subgroup consisted of 28 HIV infected out of 309 subjects in the placebo arm, and 28 out of 602 in the vaccine arm, yielding a *p*-value of 0.008 for the Wald test of hypothesis *H*_{0}. We therefore estimated the treatment effect on the mean viral load, as measured on the common logarithms scale, at a study visit 2 weeks after infection was diagnosed, among non-white subjects conditional on the base-line covariates age and risk score. Specifically, we estimated the parameters of the model

$$m(Z,X,\gamma )={\gamma}_{0}+{\gamma}_{1}\text{Age}+{\gamma}_{2}\text{Risk}+{\gamma}_{3}Z+{\gamma}_{4}\text{Z}\phantom{\rule{thinmathspace}{0ex}}\text{Age}+{\gamma}_{5}Z\phantom{\rule{thinmathspace}{0ex}}\text{Risk}.$$

For subjects who did not come in for a visit 2 weeks after HIV diagnosis, we used as outcome the viral load measurement at the visit closest in time. Of the infected subjects, seven out of 56 had a viral load that reached the assay’s measurement lower limit. These subjects were assigned viral load levels equal to the lower detection limit log_{10}(400). Since our methods do not require specification of the viral load distribution, this imputation does not induce model misspecification. Indeed, the mean of the thus-defined outcome can be viewed as the mean of the viral load distribution truncated at the assay measurement limit.

In our analysis, we assumed normal working distributions for common logarithm viral load among always infected _{E,E} subjects in both treatment arms. The variance of these distributions was assumed constant over *X* and estimated as such. The probability Pr(*S* = 1|*Z* = 0, *X*) was estimated by logistic regression.

Fig 1 and Fig 2 display the locally efficient estimates of $({\gamma}_{4}^{*},{\gamma}_{5}^{*})$,and the vaccine effect

$$m(1,\text{Age},\text{Risk})-m(0,\text{Age},\text{Risk})={\gamma}_{3}^{*}+{\gamma}_{4}^{*}\phantom{\rule{thinmathspace}{0ex}}\text{Age}+{\gamma}_{5}^{*}\phantom{\rule{thinmathspace}{0ex}}\text{Risk}$$

for various values of age and risk score respectively, as a function of the sensitivity parameter β in the range (−2.5, 2.5), and their associated pointwise 95% confidence intervals. To interpret the results, note that a difference of 0.5 in the mean of common logarithm viral load is generally considered to be the smallest clinically significant vaccine effect size. The range of (−2.5, 2.5) for β was chosen to reflect possibly severe discrepancies between the distributions of viral load (conditional on the covariates) in the always infected and the protected subjects. Values of β that are close to 2.5 correspond to the case in which the distribution of viral loads in the always infected subjects is severely tilted to the right relative to that of the protected individuals (the reverse being true when β is close to −2.5).We chose the values 0, 1 and 4 for risk score as they reflect low, medium and high risk. The chosen values of 28, 34 and 39 for age correspond to the 25th, 50th and 75th percentiles of its empirical distribution in the non-white sample respectively. Positive and negative values of β respectively correspond to assuming that, the higher and lower the viral load under placebo, the more likely it is that an individual who is infected under placebo would have also been infected under vaccine or, equivalently, that the distribution of viral load under placebo in the always infected population is to the right and the left of that of the protected population respectively. The confidence intervals in Fig. 1 include the value 0 for nearly all values of β, the only exception being the intervals for γ_{4} under β<−1. This suggests an effect modification by age only under the rather unrealistic assumption that the protected individuals tend to have much higher levels of viral load under placebo than the always infected subjects. This effect modification is reflected in Fig. 2, where for sufficiently negative values of β there is some evidence of a detrimental vaccine effect at age 28 years, but no vaccine effect at the other ages. Fig. 2 also shows that, for other values of β, there is no evidence of a vaccine effect.

Locally efficient estimates of (a)
${\gamma}_{4}^{*}$ and (b) ${\gamma}_{5}^{*}$as a function of the sensitivity parameter β and their associated pointwise 95% confidence intervals

Locally efficient estimate of the vaccine effect
$m(1,\text{Age},\text{Risk})-m(0,\text{Age},\text{Risk})={\gamma}_{3}^{*}+{\gamma}_{4}^{*}\phantom{\rule{thinmathspace}{0ex}}\text{Age}+{\gamma}_{5}^{*}\phantom{\rule{thinmathspace}{0ex}}\text{Risk}$
for various values of age and risk score as a function of the sensitivity parameter β and their associated pointwise **...**

This work was supported by National Institutes of Health grants R29 GM48704 (Jemiai and Rotnitzky), R01 AI32475 (Rotnitzky) and 1 RO1 AI054165-01 (Gilbert).

Consider any joint distribution of (*Y, S,Z, X*) satisfying condition (7).To prove theorem 1 we must exhibit a joint distribution for (*S*(0), *S*(1), *Y*(0), *Y*(1),*Z,X, Y, S*) satisfying condition (1), and the conditions defining model (*g*). We do this in the absence of covariates *X*, since the construction can then be repeated within each level of *X*. We define the candidate joint distribution in the following steps:

- given (
*Z*=*s, Y, S*), (*Y*(*z*),*S*(*z*)) (*Y, S*), i.e. the distribution of (*Y*(*z*),*S*(*z*)) given (*Z*=*z, Y, S*) is a point mass at (*Y, S*); - Pr{
*S*(0) = 1|*S*(1) = 1,*S, Y, Y*(1),*Z*= 1}1, - Pr{
*S*(0) = 1|*S*(1) = 0,*S, Y, Y*(1),*Z*= 1} {Pr(*S*= 1|*Z*= 0) − Pr(*S*= 1|*Z*= 1)}/Pr(*S*= 0|*Z*= 1) (note that this is well defined because, by condition (7), the right-hand side is a positive number); - Pr{
*S*(1) = 1|*S*(0) = 0,*S, Y*(0),*Y,Z*= 0}0; - Pr{
*S*(1) = 1|*S*(0) = 1,*S, Y*(0),*Y,Z*= 0} ω{*r*_{0}+*g*(*Y*)} where*r*_{0}is the unique solution to the equation$$E[\omega \{{r}_{0}+g(Y)\}|S=1,Z=0]=\frac{\text{Pr}(S=1|Z=1)}{\text{Pr}(S=1|Z=0)}.$$(16)

So far, we have constructed a joint distribution for (*S*(1), *S*(0), *Y*(z), *S, Y*), given *Z* = *z, z* = 0, 1. We finalize our construction by defining the candidate distribution of *Y*(1−*z*) given (*S*(1), *S*(0), *Y*(*z*), *S, Y,Z* = *z*), *z* = 0, 1, as

$$f\{Y(1-z)|S(1),S(0),Y(z),Y,S,Z=z\}\equiv f\{Y(1-z)|S(1),S(0),S,Z=1-z\}.$$

(17)

We now show that the candidate distribution satisfies condition (1) and the restrictions defining model (*g*). Condition (1) is satisfied by construction in step (a). To show that condition (2) holds, first note that step (b) implies that Pr{*S*(0) = 1|*S*(1) = 1, *Z* = 1} = 1 and therefore that Pr{*S*(0) = *S*(1) = 1|*Z* = 1} = Pr{*S*(1) = 1|*Z* = 1}. In addition,

$$\begin{array}{cc}\text{Pr}\phantom{\rule{thinmathspace}{0ex}}\{S(0)=S(1)=1|Z=0\}\hfill & =E[\omega \{{r}_{0}+g(Y)\}|S=1,Z=0]\phantom{\rule{thinmathspace}{0ex}}\text{Pr}(S=1|Z=0)\hfill \\ \hfill & =\text{Pr}(S=1|Z=1)=\text{Pr}\{S(1)=1|Z=1\},\hfill \end{array}$$

where the first equality is by steps (e) and (a), the second is by equation (16) and the third is by step (a). Therefore, Pr{*S*(0) = 1, *S*(1) = 1|*Z* = 0} = Pr{*S*(0) = 1, *S*(1) = 1|*Z* = 1}. Also, Pr{*S*(1) = 1, *S*(0) = 0|*Z* = 0} = 0 by step (d) and Pr{*S*(1) = 1, *S*(0) = 0|*Z* = 1} = 0 by step (b). Consequently, Pr{*S*(1) = 1, *S*(0) = 0|*Z* = 0} = Pr{*S*(1) = 1, *S*(0) = 0|*Z* = 1} = 0, and we have

$$\begin{array}{c}\text{Pr}\{S(0)=1|Z=1\}={\displaystyle \sum _{j=0}^{1}\text{Pr}\{S(0)=1|S(1)=j,Z=1\}\phantom{\rule{thinmathspace}{0ex}}\text{Pr}\{S(1)=j|Z=1\}=\text{Pr}\{S(1)=1|Z=1\}}\hfill \\ \hfill +\frac{\text{Pr}(S=1|Z=0)-\text{Pr}(S=1|Z=1)}{\text{Pr}(S=0|Z=1)}\text{Pr}\{S(1)=0|Z=1\}=\text{Pr}\{S(0)=1|Z=0\}\end{array}$$

where the second equality is by steps (b) and (c) and the third is by step (a).

This concludes the proof that (*S*(0), *S*(1))*Z*. That (*Y*(0), *Y*(1))*Z*|*S*(0), *S*(1) follows because, by construction in expression (17), *Y*(0) and *Y*(1) are conditionally independent given (*S*(0), *S*(1), *Z*) and, in addition, *f* {*Y*(1−*z*)|*S*(0) = 1, *S*(1),*Z* = *z*} is equal to *f* {*Y*(1−*z*)|*S*(0) = 1, *S*(1),*Z* = 1−*z*}, *z* = 0, 1. This concludes the proof that condition (2) holds.

To show that assumptions 2 and 3 hold it suffices to show that Pr{*S*(0) = 1}>0 and 0<Pr{*S*(1) = 1|*S*(0) = 1}<1. Now, Pr{*S*(0) = 1} = Pr{*S*(0) = 1|*Z* = 0} by condition (2) and the right-hand side is equal to Pr(*S* = 1|*Z* = 0) by step (a) which in turn is greater than 0 by condition (7). Furthermore,

$$\begin{array}{cc}\text{Pr}\{S(1)=1|S(0)=1\}\hfill & =\text{Pr}\{S(1)=1|S(0)=1,Z=0\}\hfill \\ \hfill & =E[\omega \{{r}_{0}+g(Y)\}|S(0)=1,Z=0]\hfill \end{array}$$

where the first equality is by condition (2) and the second is by step (e). Thus, 0<Pr{*S*(1) = 1|*S*(0) = 1}<1 follows because 0 > ω{*r*_{0} + *g*(*y*)}> 1 for any *y*. This shows that assumptions 2 and 3 hold. Condition 1 follows by step (d) and condition (2). Finally, assumption 4 follows directly from step (e) after application of Bayes rule.

That restrictions 1 and 2 are restrictions on the observed data distribution that are implied by model (*g*) follows easily from expressions (5) and (6) respectively. To show that these are the only restrictions that are imposed by model (*g*) we must exhibit, for any distribution of (*Y, S,Z, X*) satisfying restrictions 1 and 2, a joint distribution of (*X,Z, S, Y, S*(0), *S*(1), *Y*(0), *Y*(1)) that satisfies condition (1) and the restrictions defining model (*g*). Such a distribution can be constructed within each level of *X*, exactly as in the proof of theorem 1, but replacing ω{*r*_{0} + *g*(*Y*)} with ω{*r*(*X*;α*) + *g*(*Y, X*)}, which satisfies equation (16) conditional on *X* by restriction 2.

Since, under model (*g*), condition (13) holds, the model can be parameterized with the known function *g*(·,·), the known conditional probability function Pr(*Z* = *z*|*X*=·), the unknown *q* × 1 and *p* × 1 parameters γ and α, and an infinite dimensional parameter indexing *f*(*X*), Pr(*S* = 1|*Z* = 1, *X*), *f*(ε_{0}|*S* = 1,*Z* = 0, *X*) and *f*(ε_{1}|*S* = 1, *Z* = 1, *X*) where ε_{j} *Y* − *E*(*Y*|*S* = 1, *Z* = *z, X*), *z* {0, 1}. The set of influence functions of RAL estimators of (γ, α) under the model is the set {
$E{({AS}_{\text{eff}}^{T})}^{-1}$
*A*: *A* is a(*q* + *p*) × 1 random vector with *j*th entry *A*_{j} Λ ^{}} where *S*_{eff} is the efficient score for (γ, α) and
${\wedge}^{\perp}={\wedge}_{x}^{\perp}\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}{\wedge}_{s}^{\perp}\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}{\wedge}_{0}^{\perp}\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}{\wedge}_{1}^{\perp}$. The sets Λ_{x},Λ_{s},Λ_{0} and Λ_{1} are the infinite dimensional parameter-specific tangent sets comprised of products of constant conformable vectors times the scores for the parameters in any regular parametric submodel for *f*(*X*), Pr(*S* = 1|*Z* = 1, *X*), *f*(ε_{0}|*S* = 1, *Z* = 0, *X*) and *f*(ε_{1}|*S* = 1, *Z* = 1, *X*) respectively (for a precise definition of the nuisance tangent space, see for example Newey (1990)). Throughout, a set superscripted with ‘’ denotes the orthogonal complement of that set and Π(·|·)denotes the projection operator in the Hilbert space
${\mathcal{L}}_{2}^{0}$ (*F*_{O})of zero-mean, finite variance, random variables with covariance inner product under the observed data law *F*_{O}.

Consider arbitrary correctly specified parametric submodels *f*(*X*;ϕ), Pr(*S* = 1|*Z* =1, *X*; η_{2}), *f*(ε_{0}|*S* = 1, *Z* = 0, *x*; η_{0}) and *f*(ε_{1}|*S* = 1, *Z* = 1,*X*; η_{1}) where (
${\varphi}^{*}{\eta}_{0}^{*},{\eta}_{1}^{*},{\eta}_{2}^{*}$)
correspond to the true distribution. The likelihood that is based on a single observation *O* is proportional to (η, α, γ, ϕ) = *f* {ε_{0}(γ)|*S* = 1, *Z* = 0, *X*; η_{0}}^{S(1−Z)} *f* {ε_{1}(γ)|*S* = *Z* = 1, X; η_{1})^{SZ} *Q*(α, γ, η_{0}, η_{2}) *f*(*X*;ϕ) where η (η_{0}, η_{1}, η_{2}),

$$Q(\alpha ,\lambda ,{\eta}_{0},{\eta}_{2})\equiv {\{\mathrm{\Gamma}({\eta}_{2})\phantom{\rule{thinmathspace}{0ex}}R{(\alpha ,\lambda ,{\eta}_{0})}^{Z-1}\}}^{S}{\{1-\mathrm{\Gamma}({\eta}_{2})\phantom{\rule{thinmathspace}{0ex}}R{(\alpha ,\lambda ,{\eta}_{0})}^{Z-1}\}}^{1-S},$$

Γ(η_{2}) *P*(*S* = 1|*Z* = 1, *X*; η_{2}),

$$R(\alpha ,\lambda ,{\eta}_{0})\equiv E[\omega \{r(X;\alpha )+g(X,Y)\}|S=1,Z=0,X;{\eta}_{0},\gamma ]$$

and ε_{z}(γ) *Y* − *E*(*Y*|*S* = 1, *Z* = *z*, *X* = *x*; γ), *z* {0, 1}

Thus, with *S _{ηj}* denoting the score for η

$${S}_{{\eta}_{2}}=\{S-E(S|Z,X)\}\frac{\partial}{\partial {\eta}_{2}}\text{logit}\{\mathrm{\Gamma}({\eta}_{2}){|}_{{\eta}_{2}={\eta}_{2}^{*}}\}{u}_{0}{(X)}^{1-Z}$$

where logit(ν) = log {ν/(1 − ν)} and u_{0}(*X*) = {1 − *E*(*S*|*Z* = 1, *X*)}/{1−*E*(*S*|*Z* = 0, *X*)}. Also, after some algebra and using the fact that

$$\frac{\partial}{\partial {\eta}_{2}}R({\alpha}^{*},{\gamma}^{*},{\eta}_{0}){|}_{{\eta}_{0}={\eta}_{0}^{*}}=E({\omega}^{*}{A}_{0}|S=1,Z=0,X)$$

where

$${A}_{0}\equiv {a}_{0}({\epsilon}_{0},X)\equiv \frac{\partial}{\partial {\eta}_{0}}\text{log}\{f({\epsilon}_{0}|S=1,Z=0,X;{\eta}_{0})\}{|}_{{\eta}_{0}={\eta}_{0}^{*}}$$

it can be shown that

$${S}_{{\eta}_{0}}=(1-Z)[S{A}_{0}-\{S-E(S|Z=0,X)\}E({\omega}^{*}{A}_{0}|S=1,Z=0,X)/{u}_{1}(X)]$$

where u_{1}(*X*) = *R*(α*, γ*,
${\eta}_{0}^{*}$)
{1 − *E*(*S*|*Z* = 0, *X*)}. From these expressions we can deduce that Λ_{s} = {{*S* − *E*(*S*|*Z*, *X*)} *u*_{0}(*X*)^{1−Z} *b*(*X*): *b*(*X*) is arbitrary} ∩
${\mathcal{L}}_{2}^{0}$
(*F*_{O}) and that

$$\begin{array}{c}\hfill {\mathrm{\Lambda}}_{0}={\mathcal{L}}_{2}^{0}({F}_{O})\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}\{(1-Z)[S{A}_{0}-\{S-E(S|Z=0,X)\}E({\omega}^{*}{A}_{0}|S=1,Z=0,X)/{u}_{1}(X)]:\\ \hfill {A}_{0}={a}_{0}({\epsilon}_{0},X)\phantom{\rule{thinmathspace}{0ex}}\text{satisfies}\phantom{\rule{thinmathspace}{0ex}}E({\omega}^{*}{\epsilon}_{0}{A}_{0}|S=1,Z=0,X)=E({A}_{0}|S=1,Z=0,X)=0\}.\end{array}$$

We arrive at the last set after noting that restriction 1 implies that *E*(ω*ε_{0}*A*_{0}|*S* = 1,*Z* = 0, *X*) = 0 and that *A*_{0}, being a score in a model for a conditional distribution given *X, Z* = 0 and *S* = 1, has conditional mean 0. Also, a similar argument leads to

$${\mathrm{\Lambda}}_{1}={\mathcal{L}}_{2}^{0}({F}_{O})\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}\{SZ{A}_{1}:{A}_{1}\equiv {a}_{1}({\epsilon}_{1},X)\phantom{\rule{thinmathspace}{0ex}}\text{satisfies}\phantom{\rule{thinmathspace}{0ex}}E({\epsilon}_{1}{A}_{1}|S=1,Z=1,X)=E({A}_{1}|S=1,Z=1,X)=0\}.$$

In addition, Λ_{x} =
${\mathcal{L}}_{2}^{0}$ (*F*_{O}) ∩ {*c*(*X*): *E*{*c*(*X*)} = 0}. We next show that

$$\begin{array}{cc}{\mathrm{\Lambda}}_{s}^{\perp}\hfill & ={L}_{2}^{0}({F}_{O})\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}\{\{{D}_{1}-E({D}_{1}|X,Z,S)\}+h(X)-E\{h(X)\}+\{S{\omega}^{*1-Z}-E(S{\omega}^{*1-Z}|X)\}\hfill \\ \hfill & \times \phantom{\rule{thinmathspace}{0ex}}\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{3}(X)+Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{2}(X);{D}_{1}\equiv {d}_{1}(Y,X,Z,S),{d}_{2},h\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{d}_{3}\phantom{\rule{thinmathspace}{0ex}}\text{arbitrary}\}.\hfill \end{array}$$

To do so, first note that any element of
${\mathcal{L}}_{2}^{0}$ (*F*_{O}) can be written as

$${D}_{1}-E({D}_{1}|X,Z,S)+\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{2}(X)+h(X)-E\{h(X)\}+\{S-E(S|X,Z)\}\{Z{t}_{1}(X)+{t}_{0}(X)(1-Z)\}$$

for some *D*_{1} *d*_{1} (*Y,X,Z, S*), *h*(*X*), *d*_{2}(*X*), *r*_{0} (*X*) and *r*_{1} (*X*). Next, note that any element of Λ_{s}, being a function of (*X,Z, S*), is uncorrelated with *D*_{1} −*E*(*D*_{1}|*X,Z, S*). In addition, since any element of Λ_{s} is a linear combination of (or a limit of linear combinations of) scores under submodels for *f*(*S*|*X, Z*), it must have mean 0 given (*X, Z*) and therefore it is uncorrelated with {*Z*−*E*(*Z*|*X*)} *d*_{2}(*X*) + *h*(*X*) − *E*{*h*(*X*)}. So it remains to determine the subset of random variables of the form {*S* − *E*(*S*|*X, Z*)} {*Z* *t*_{1}(*X*) + *t*_{0} (*X*)(1−*Z*)} that are uncorrelated with the elements of Λ_{s}. These satisfy, for all *b*(*X*),

$$0=E[\{S-E(S|X,Z)\}\{Z\phantom{\rule{thinmathspace}{0ex}}{t}_{1}(X)+{t}_{0}(X)(1-Z)\}\{S-E(S|Z,X)\}\phantom{\rule{thinmathspace}{0ex}}{u}_{0}{(X)}^{1-Z}b(X)$$

which after some calculations can be seen to be equivalent to

$${t}_{0}(X)=-{t}_{1}(X)\phantom{\rule{thinmathspace}{0ex}}E({\omega}^{*}|S=1,Z=0,X)\phantom{\rule{thinmathspace}{0ex}}E(Z|X){\{1-E(Z|X)\}}^{-1}$$

from where we can write

$$\begin{array}{cc}\{S-E(S|X,Z)\}\phantom{\rule{thinmathspace}{0ex}}\{Z\phantom{\rule{thinmathspace}{0ex}}{t}_{1}(X)+{t}_{0}(X)(1-Z)\}\hfill & =\{E{({\omega}^{*}|S=1,Z=0,X)}^{1-Z}-{\omega}^{*1-Z}\}{t}_{1}^{*}(X)S\{Z-E(Z|X)\}\hfill \\ \hfill & +{t}_{1}^{*}(X)\{Z-E(Z|X)\}S{\omega}^{*1-Z}-{t}_{1}^{*}(X)\phantom{\rule{thinmathspace}{0ex}}E(S|X,Z)\hfill \\ \hfill & \times E{({\omega}^{*}|S=1,Z=0,X)}^{1-Z}\{Z-E(Z|X)\}\hfill \end{array}$$

with
${t}_{1}^{*}$
(*X*) = *t*_{1} (*X*) {1 − *E*(*Z*|*X*)}^{−1}. But the first term of the right-hand side of the last identity is of the form *D*_{1} − *E*(*D*_{1}|*X,Z, S*). Also the last term is of the form *d*_{2} (*X*){*Z* − *E*(*Z*|*X*)} because *E*(*S*|*X, Z*) *E*(ω*|*S* = 1,*Z* = 0, *X*)^{1−Z} does not depend on *Z*. Finally, the postulated form for
${\Lambda}_{s}^{\perp}$
is obtained after noting that
${t}_{1}^{*}$
(*X*) is unrestricted because *t*_{1}(*X*) is unrestricted.

We now characterize the elements of
${\mathrm{\Lambda}}_{s}^{\perp}$ that are also in
${\mathrm{\Lambda}}_{0}^{\mathrm{\perp}}\phantom{\rule{thinmathspace}{0ex}}\cap \phantom{\rule{thinmathspace}{0ex}}{\mathrm{\Lambda}}_{1}^{\mathrm{\perp}}$.
First note that the elements of Λ_{0} and Λ_{1} are linear combinations of (or limits of linear combinations of) scores for submodels for *f*(*Y*|*Z, S, X*) and/or *f*(*S*|*Z, X*). Therefore, they are uncorrelated with {*Z* − *E*(*Z*|*X*)} *d*_{2} (*X*) + *h*(*X*) − *E*{*h*(*X*)} since this is a function of (*X, Z*) only. In addition, it is straightforward to check that the elements of Λ_{0} and Λ_{1} are also uncorrelated with random variables of the form *S*ω*^{1−Z}{*Z* − *E*(*Z*|*X*)} *d*_{3}(*X*). Thus, it remains to find the subset of the random variables of the form
${D}_{1}^{*}$ *D*_{1} − *E*(*D*_{1}|*X,Z, S*) that are orthogonal to both Λ_{0} and Λ_{1}. Now, write
${D}_{1}^{*}$ = *SZ* {*M*_{1} − *E*(*M*_{1}|*X,Z* = 1, *S* = 1)} + *S*(1 − *Z*) {*M*_{0} − *E*(*M*_{0} | *X, Z* = 0, *S* = 1)} for some*M*_{1} *m*_{1} (*Y, X*) and *M*_{0} *m*_{0} (*Y, X*). It can be easily checked that
${D}_{1}^{*}$
is uncorrelated with the elements of Λ_{0} if and only if, for all *A*_{0} *a*_{0} (ε_{0},(*X*) defined as in the set Λ_{0}, 0 = *E*[*S*(1 − *Z*) {*M*_{0} − *E*(*M*_{0}|*X,Z* = 0, *S* = 1)}A_{0}]. Equivalently, *M*_{0} satisfies 0 = cov {*m*_{0}(*Y, X*), *a*_{0}(ε_{0}, *X*)|*Z* = 0, *S* = 1, *X*} for all *A*_{0}. Reasoning again as in Chamberlain (1987) or Robins and Rotnitzky (1995) we conclude that there is a *d*_{0}(*X*) such that *m*_{0}(*Y, X*) = *d*_{0}(*X*)ω*ε_{0}. Similarly, we can arrive at the expression *m*_{1}(*Y, X*) = *d*_{1}(*X*)ε_{1} for some *d*_{1}(*X*). We conclude that the elements of
${\mathrm{\Lambda}}_{s}^{\perp}\cap {\mathrm{\Lambda}}_{0}^{\perp}\cap {\mathrm{\Lambda}}_{1}^{\perp}$
are of the form

$$\begin{array}{c}S(1-Z)\phantom{\rule{thinmathspace}{0ex}}{d}_{0}(X){\omega}^{*}{\epsilon}_{0}+SZ\phantom{\rule{thinmathspace}{0ex}}{d}_{1}(X){\epsilon}_{1}+\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{2}(X)+h(X)-E\{h(X)\}+[S{\omega}^{*1-Z}-E\{S{\omega}^{*1-Z}|X\}]\hfill \\ \times \phantom{\rule{thinmathspace}{0ex}}\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{3}(X).\hfill \end{array}$$

The subset of these which are also orthogonal to the elements of Λ* _{x}* is of the form

$$S(1-Z)\phantom{\rule{thinmathspace}{0ex}}{d}_{0}(X){\omega}^{*}{\epsilon}_{0}+\mathit{\text{SZ}}\phantom{\rule{thinmathspace}{0ex}}{d}_{1}(X){\epsilon}_{1}+\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{2}(X)+\{S{\omega}^{*1-Z}-E(S{\omega}^{*1-Z}|X)\}\phantom{\rule{thinmathspace}{0ex}}\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{3}(X)$$

for arbitrary *d*_{0}(*X*), *d*_{1}(*X*), *d*_{2}(*X*) and *d*_{3}(*X*), or equivalently the form *h*(*O; d*, α*, γ*) = *d*(*X*) *q*(*O*). That the set of influence functions is as postulated follows immediately from this expression. This concludes the proof of part (a) of theorem 3.

We now prove part (c) before turning to the proof of part (b). In model (*g*), the orthogonal complement to the nuisance tangent set is the subset of the elements of the form as in the last display that are also uncorrelated with scores for Pr(*Z*|*X*). We now show that this subset is comprised of all random variables of the form

$$S(1-Z)\phantom{\rule{thinmathspace}{0ex}}{d}_{0}(X){\omega}^{*}{\epsilon}_{0}+\text{SZ}\phantom{\rule{thinmathspace}{0ex}}{d}_{1}(X){\epsilon}_{1}+\{S{\omega}^{*1-Z}-E(S{\omega}^{*1-Z}|X)\}\phantom{\rule{thinmathspace}{0ex}}\{Z-E(Z|X)\}\phantom{\rule{thinmathspace}{0ex}}{d}_{3}(X)$$

(18)

for arbitrary *d*_{1}(*X*), *d*_{0}(*X*) and *d*_{3}(*X*). This follows by noting that

- a score for Pr(
*Z*|*X*) has the form {*Z*−*E*(*Z*|*X*)}*t*(*X*) for some*t*(*X*), *S*(1−*Z*)*d*_{0}(*X*)ω*ε_{0}+*SZ**d*_{1}(*X*)ε_{1}has mean 0 given (*X, Z*) and is therefore uncorrelated with {*Z*−*E*(*Z*|*X*)}*t*(*X*) and- {
*Z*−*E*(*Z*|*X*)}*d*_{2}(*X*) +*S*ω*^{1−Z}{*Z*−*E*(*Z*|*X*)}*d*_{3}(*X*) is uncorrelated with {*Z*−*E*(*Z*|*X*)}*t*(*X*)if and only if

$$\begin{array}{cc}0\hfill & ={d}_{2}(X)\phantom{\rule{thinmathspace}{0ex}}E[{\{Z-E(Z|X)\}}^{2}|X]+{d}_{3}(X)E[E(S{\omega}^{*1-Z}|Z,X){\{Z-E(Z|X)\}}^{2}|X]\hfill \\ \hfill & =E[{\{Z-E(Z|X)\}}^{2}|X]\{{d}_{2}(X)+{d}_{3}(X)E(S{\omega}^{*1-Z}|X)\}\hfill \end{array}$$

from where it follows that *d*_{2}(*X*) =− *d*_{3} (*X*) *E*(*S*ω^{*1−Z}|*X*).

The proof of part (c) is concluded by noting that random variables of the form (18) can be written as *h*(*O; d*, α*, γ*) with the last column of *d*(*X*) equal to zero. Turning now to part (b), the efficient score for (α, γ) is the element *d*_{eff} (*X*) *q*(*O*; α*, γ*) of the nuisance tangent space which satisfies var {*d*_{eff} (*X*) *q*(*O*; α*, γ*)}^{−1} ≥ var {*h*(*O; d*, α*, γ*)} for all (*p* + *q*) × 4 functions *d*(*X*). It is easily checked that this inequality is satisfied when *d*_{eff} (*X*) is as in equation (10). This concludes the proof of part (b).

Yannis Jemiai, Cytel Inc., Cambridge, USA.

Andrea Rotnitzky, DiTella University, Buenos Aires, Argentina, and Harvard School of Public Health, Boston, USA.

Bryan E. Shepherd, Vanderbilt University, Nashville, USA.

Peter B. Gilbert, University of Washington, Seattle, USA.

- Angrist J, Imbens GW, Rubin DB. Identification of causal effect using instrumental variables. J. Am. Statist. Ass. 1996;91:444–455.
- Bickel JP, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semipara-metric Models. Baltimore: Johns Hopkins University Press; 1993.
- Chamberlain G. Asymptotic efficiency in estimation with conditional moment restrictions. J. Econometr. 1987;34:305–334.
- Cox DR. Planning of Experiments. New York: Wiley; 1958.
- Dawid AP. Conditional independence in statistical theory (with discussion) J. R. Statist. Soc. B. 1979;41:1–31.
- Dawid AP. Causal inference without counterfactuals. J. Am. Statist. Ass. 2000;95:407–437.
- Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. [PubMed]
- Gilbert PB, Bosch RJ, Hudgens MG. Sensitivity analysis for the assessment of causal vaccine effects on viral load in HIV vaccine trials. Biometrics. 2003;59:531–541. [PubMed]
- Hayden D, Pauler DK, Schoenfeld D. An estimator for treatment comparisons among survivors in randomized trials. Biometrics. 2005;61:305–310. [PubMed]
- Hudgens MG, Hoering A, Self SG. On the analysis of viral load endpoints in HIV vaccine trials. Statist. Med. 2003;22:2281–2298. [PubMed]
- Jemiai Y. Doctoral Dissertation. Boston: Department of Biostatistics, Harvard School of Public Health; 2005. Asymptotic properties of an estimator of treatment effects on an outcome only existing if a post-randomization event has occurred.
- Jemiai Y, Rotnitzky A. Sharp bounds and sensitivity analysis for treatment effects in the presence of censoring by death. Schering-Plough Wrkshp Development and Approval of Oncology Drug Products. 2003. Available from http://www.hsph.harvard.edu/live/sp-2003-rotnitzky.html.
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: Wiley; 1980.
- Klaassen CAJ. Consistent estimation of the influence function of locally asymptotically linear estimators. Ann. Statist. 1987;15:1548–1562.
- Newey WK. Semiparametric efficiency bounds. J. Appl. Econometr. 1990;5:99–135.
- Neyman J. On the application of probability theory to agricultural experiments: essay on principles, section 9 (Engl. transl.) Statist. Sci. 1990;5:465–480.
- rgp120 HIV Vaccine Study Group. Placebo-controlled trial of a recombinant glycoprotein 120 vaccine to prevent HIV infection. J. Infect. Dis. 2005;191:654–665. [PubMed]
- Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy worker survivor effect. Math. Modllng. 1986;7:1393–1512.
- Robins JM. An analytic method for randomized trials with informative censoring: part I. Liftime Data Anal. 1995;1:241–254. [PubMed]
- Robins JM, Greenland S. Comment on “Causal inference without counterfactuals,” by A. P. Dawid. J. Am. Statist. Ass. 2000;95:431–435.
- Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed]
- Robins JM, Rotnitzky A. Semiparametric efficiency in multivariate regression models with missing data. J. Am. Statist. Ass. 1995;90:122–129.
- Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann. Statist. 1978;6:34–58.
- Rubin DB. Statistics and causal inference: which ifs have causal answers. J. Am. Statist. Ass. 1986;81:961–962.
- Rubin DB. More powerful randomization-based
*p*-values in double-blind trials with noncompliance. Statist. Med. 1998;17:371–389. [PubMed] - Rubin DB. Comment on “Causal inference without counterfactuals,” by A. P. Dawid. J. Am. Statist. Ass. 2000;95:435–437.
- Rubin DB. Causal inference through potential outcomes and principal stratification: application to studies with ‘censoring’ due to death. Statist. Sci. 2006;21:299–309.
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semipara-metric nonresponse models. J. Am. Statist. Ass. 1999;94:1096–1120.
- Shepherd BE, Gilbert PB, Jemiai Y, Rotnitzky A. Sensitivity analyses comparing outcomes only existing in a subset selected post-randomization, conditional on covariates, with application to HIV vaccine trials. Biometrics. 2006;62:332–342. [PubMed]
- Zhang JL, Rubin DB. Estimation of causal effects via principal stratification when some outcomes are truncated by “death” J. Educ. Behav. Statist. 2003;28:353–368.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |