Search tips
Search criteria 


Logo of biometLink to Publisher's site
Biometrika. 2010 March; 97(1): 171–180.
Published online 2009 December 8. doi:  10.1093/biomet/asp062
PMCID: PMC3412601

On doubly robust estimation in a semiparametric odds ratio model

Eric J. Tchetgen Tchetgen and James M. Robins
Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A., ude.dravrah.hpsh@egtehcte, ; ude.dravrah.hpsh@snibor


We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).

Some key words: Doubly robust, Generalized odds ratio, Locally efficient, Semiparametric logistic regression

1. Introduction

Given a random vector O = (Y, A, L) the conditional odds ratio function γ (Y, A, y0, a0, L) between A and Y given L at a given base point (a0, y0) is


where the vectors Y and A can take either discrete values, continuous values, or a mixture of both, L is a high-dimensional vector of measured auxiliary covariates, (a0, y0) is a user specified point in the sample space and f (Y | A, L), g(A | Y, L) and h(A, Y | L) are, respectively, the conditional densities of Y given A and L, the conditional density of A given Y and L and the joint conditional density of A and Y given L with respect to a dominating measure μ. The odds ratio function is a particularly useful measure of association when Y and A take both discrete and continuous values. For instance, A and Y could each be a mixture of a discrete component encoding, say, the presence or absence of a given bacterium and a continuous component encoding the bacterial counts when it is present. In such a case, as argued by Chen (2007), a complete characterization of the association between bacterium A and bacterium Y given L would require separate comparisons of the probabilities of absence of one bacterium when the other bacterium is either absent or present at a particular concentration, and of the concentration distribution for one bacterium when the other bacterium is either absent or present at a particular concentration. Instead, the direct estimation of the odds ratio function relating bacterium A to bacterium Y given covariates L provides a unified solution to this problem and obviates the need for separate analyses.

Given n independent and identically distributed copies of O, Chen (2007) proposed a locally efficient iterative estimator of the parameter ψ0 in a semiparametric model B that specifies (i) γ (Y, A, L) is equal to a known function γ (Y, A, L; ψ) evaluated at the unknown true p-dimensional parameter vector ψ0, i.e.


where γ (Y, A, L; ψ) takes the value 1 if A = a0, Y = y0, or ψ = 0, so ψ0 = 0 encodes the null hypothesis that Y and A are conditionally independent given L, and (ii) either but not necessarily both, (a) a given parametric model f (Y | a0, L; θ) for f (Y | a0, L) or (b) a parametric model g(A | y0, L; α) for g(A | y0, L) is correct. Model B is referred to as a union model because it is the union of the model C that assumes that (i) and (iia) are true and the model D that assumes that (i) and (iib) are true. An estimator of ψ0 that is consistent and asymptotically normal under this union model is referred to as doubly robust because, given equation (1), the estimator is consistent and asymptotically normal for ψ0 if one has succeeded in specifying either a correct model f (Y | a0, L) or a correct model for g(A | y0, L), thus giving the data analyst two chances rather than one chance to obtain valid inference for ψ0.

An example of a simple parametric model for the odds ratio function is the bilinear log-odds ratio model (Chen, 2003, 2004). It assumes that γ (Y, A, L; ψ0) = exp{ψ0(Yy0) [multiply sign in circle] (Aa0)}, where [multiply sign in circle] is the direct product. This model includes all of the generalized linear regression models with canonical link functions as special cases. In the case of stratified 2 × 2 tables, it implies homogeneous odds ratios, but is easily extended to the case of nonhomogeneous odds ratios. Other interesting examples of odds ratio models are given by Chen (2007).

Unfortunately, Chen’s aforementioned locally efficient doubly robust estimator of ψ0 under model B is computationally very demanding, especially when A and Y have multiple continuous components. The main contribution of our paper is to provide novel and highly efficient doubly robust estimators of ψ0 that are substantially easier to compute than those of Chen.

2. Preliminaries

Before describing our new approach, we briefly summarize Chen’s results. He considered the following parametric and semiparametric approaches to the estimation of ψ0: a prospective likelihood approach under the model C that assumes that one has correctly modelled the nuisance baseline function f (Y | a0, L); a retrospective likelihood approach under the model D that assumes that one has correctly specified a model for the nuisance baseline function g(A | y0, L); a joint likelihood approach under the intersection model that assumes that both models C and D are correct; and a doubly robust locally semiparametric efficient approach under the union model B of § 1.

In his doubly robust approach, Chen establishes that in the semiparametric model [mathematical script A] characterized by the sole restriction (1), the density h(A, Y | L) can be written as h(A, Y | L; ψ0), where


f (y | L, A = a0) and g(a | Y = y0, L) are the unknown conditional densities that generated the data and are solely restricted by ∫ γ (y, a, L) f (y | L, A = a0)g(a | Y = y0, L)(a, y) < ∞ almost everywhere. Then, he specifies parametric models f (Y | a0, L; θ) and g(A | y0, L; α) for the unknown nuisance baseline functions f (y | a0, L) and g(a | y0, L), obtains profile estimates [theta w/ hat](ψ) and [alpha] (ψ) of the nuisance parameters θ and α and calculates the efficient score Ŝeff (ψ) [equivalent] Seff {[theta w/ hat](ψ), [alpha] (ψ), ψ} for ψ in the semiparametric model [mathematical script A] evaluated at the law [γ (y, a, l; ψ), f {y | a0, l; [theta w/ hat](ψ)}, g{a |y0,l; [alpha](ψ)}] indexed by {[theta w/ hat](ψ), [alpha] (ψ), ψ}. Next, he estimates ψ0 with the solution [psi]eff to Pn{Ŝeff (ψ)} = 0, where Pn(H) = n−1i Hi, and proves that [psi]eff is regular and asymptotically linear and thus consistent and asymptotically normal under the union model B. Further general results of Robins and Rotnitzky (2001) imply that Ŝeff (ψ) is also the efficient score for ψ in model B under the law [γ (y, a, l; ψ), f {y | a0, l; [theta w/ hat](ψ)}, g{a |y0, l; [alpha](ψ)}]. It follows that the estimator [psi]eff is locally semiparametric efficient under model B at the intersection submodel with both nuisance models correct; that is, [psi]eff attains the semiparametric efficiency bound for the model B when both nuisance models happen to hold.

By definition, the efficient score Seff = Π (Sψ | Λnuis) for a parameter ψ in a given model is the projection of the score Sψ for ψ onto the orthocomplement Λnuis to the nuisance tangent space Λnuis in the Hilbert space L2 [equivalent] L2(FO) of zero-mean functions of p dimensions, T [equivalent] t(A, Y, L) = t(O), with inner product EFO ( T1TT2) [equivalent] E( T1TT2), and corresponding squared norm ‖T2 = E(TTT), where FO is the distribution function that generated the data. Chen proves that for model [mathematical script A], the set


contains all functions that have zero-mean conditional on both (A, L) and (Y, L). When both A and Y contain continuous components and ψ0 ≠ 0, Chen (2007) finds that this projection and therefore Seff do not exist in closed form and must be computed using the iterative alternating conditional expectations algorithm. Each iteration requires the evaluation, by numerical integration, of conditional expectations, which seriously limits the practicality of Chen’s approach, particularly when A and/or Y have two or more continuous components.

The main contribution of our paper is to show that, even though the projection Π(R | Λnuis) of a given random variable R = r(Y, A, L) into the orthocomplement Λnuis does not exist in closed form when both A and Y contain continuous components, the set Λnuis does have a closed-form representation, which appears to be new. We use our representation to obtain doubly robust estimators, i.e. consistent and asymptotically normal estimators of ψ0 in the union model B, that are nearly as efficient as [psi]eff under the intersection submodel, yet do not require the alternating conditional expectations algorithm. Moreover, our closed-form representation of Λnuis is of independent interest, with applications beyond the present paper. For example, Vansteelandt et al. (2008) use our representation to construct multiple robust estimators of the parameter encoding the interaction on an additive and multiplicative scale between two exposures A1 and A2 in their effects on an outcome Y.

In the special situation where either Y or A has finite support, Bickel et al. (1993) provide a closed-form expression for Π(R | Λnuis), which Chen, however, did not use to give a closed-form expression for Seff. We remedy this oversight and obtain doubly robust locally-efficient closed-form estimating functions when Y and/or A has finite support; some emphasis is given to the important case of dichotomous Y which, incidentally, coincides with the semiparametric logistic regression model.

In the following, for a vector υ we write υ[multiply sign in circle]2 = υυT. To simplify notation, we suppose y0 =0 and a0 = 0 throughout, so that γ (Y, 0, L; ψ) = γ (0, A, L; ψ) = γ (Y, A, L; 0) = 1. We shall also use the following definition.

Definition 1. Given conditional densities f(Y | L) and g(A | L), the density h(Y, A | L) = f(Y | L)g(A | L) that makes A and Y conditionally independent given L is an admissible independence density if the joint law of (Y, A) given L under h(·, ·| L) is absolutely continuous with respect to the true law of (Y, A) given L with probability one. Furthermore, E(· |·, L) denotes conditional expectations with respect to h(Y, A | L).

3. Main result

As noted previously, under model [mathematical script A] characterized by restriction (1), Chen showed that Λnuis is given by the set (3). We now provide a new closed-form representation of this set. To do so, for a fixed choice of admissible independence density h(Y, A | L) = f(Y | L)g(A | L) and any p-dimensional function d of (Y, A, L), define the random vector U(ψ; d, h) as


with h (Y, A | L; ψ) defined in (2) and d(Y, A, L) = E(D | A, L) + E(D | Y, L) − E(D | L) for D [equivalent] d(Y, A, L). The following theorem gives the influence functions of regular asymptotically linear estimators of ψ0 in model [mathematical script A] and will form the basis for our doubly robust approach.

Theorem 1. Given an admissible independence density h, an alternative representation of the set Λnuis of (3) is Λnuis = {U(ψ0; d, h) : d unrestricted} ∩ L2.

Proof. One can verify by explicit calculation that {U(ψ0; d, h) : d} ∩ L2 [subset, dbl equals] Λnuis. To show the other direction, take any function υ(A, Y, L) in Λnuis, let d(Y, A, L) = υ(A, Y, L)h(A, Y |L)/ h(Y, A | L). Then υ(A, Y, L) = U(ψ0; d, h) since ∫d(y, A, L) f(y | L)(y) =∫ υ (A, y, L) f (y | A, L)g(A | L)/g†(A | L)(y) = E{υ (A, Y, L) | A, L}g(A | L)/g†(A | L) = 0 and ∫ d(Y, a, L)g(a | L)(a) = E{υ(A, Y, L) | Y, L} f (Y | L)/ f(Y | L) = 0.

Remark. We give an alternative, more abstract, proof of the fact that U(ψ0; d, h) [equivalent] {d(Y, A, L) − d(Y, A, L)}h(Y, A | L)/ h(Y, A | L; ψ0) [set membership] Λnuis. Given an admissible independence density h, let Λnuis, be the set (3) with expectations taken under h. It is well known that when, as under h, A and Y are conditionally independent given L, Λnuis, admits the representation {d(Y, A, L) − d(Y, A, L) : d}. Then d(Y, A, L) − d(Y, A, L) [set membership] Λnuis, implies {d(Y, A, L) − d(Y, A, L)}h(Y, A | L)/ h(Y, A | L; ψ0) [set membership] Λnuis, by the Radon–Nikodym theorem.

By standard semiparametric theory (Bickel et al., 1993), Theorem 1 implies that if [psi] is a regular and asymptotically linear estimator of ψ0 in model [mathematical script A], then given any admissible independence density h, there exists a p-dimensional function D [equivalent] d(O) such that n1/2([psi]ψ0) = n1/2 Pn[E{[partial differential]U(ψ; d, h)/[partial differential]ψT|ψ=ψ0}−1U(ψ0; d, h)] +op(1). Furthermore, this also implies that any regular and asymptotically linear estimator of ψ0 in model [mathematical script A] can be obtained, up to asymptotic equivalence, as the solution to an equation i=1nUi(ψ;d,h)=0. However, these solutions are infeasible because h(Y, A | L; ψ) depends on the unknown conditional densities f (y | L, A = 0) and g(a | Y = 0, L), which must be estimated from the data. While a nonparametric smoothing method would, in principle, be the preferred approach to estimate these densities, its finite-sample performance is bound to be poor for continuous L of moderate to high dimension because of the curse of dimensionality. A practical alternative is to proceed as in Chen (2007) and to impose working models of reduced dimension for the unknown baseline functions f (Y | A = 0, L) and g(A | Y = 0, L). Hence, we specify variation independent parametric models g(A | Y = 0, L; α) for g(A | Y = 0, L) and f (Y | A = 0, L; θ) for f (Y | A = 0, L) with unknown finite-dimensional parameters α and θ. Since we cannot be sure that either f (Y | A = 0, L; θ) or g(A | Y = 0, L; α) is correctly specified, we shall construct a doubly robust estimator of ψ0 that is guaranteed to be consistent and asymptotically normal if either, but not necessarily both, of these working models is correct.

To do so, we adopt the notational convention introduced in § 1 that given a function such as U(ψ0; d, h) which depends on the unknown law h(Y, A | L; ψ0), we let U (ψ, θ, α; d, h) be the function U(ψ0; d, h) evaluated at the law {γ (y, a, l; ψ), f (y | 0, l; θ), g(a | 0, l; α)}. Then, Theorem 2 shows that, under standard regularity conditions, [psi] [equivalent] [psi] (d; ĥ) is doubly robust, where [psi] (d; ĥ) is the solution to


d(Y, A, L) is a user-supplied function,

α^(ψ)=arg maxαi=1nlog{g(Ai|Yi,Li;ψ,α)}

is the profile maximum likelihood estimator of α at a fixed ψ, g(A | Y, L; ψ, α) = γ(Y, A, L; ψ)g(A | Y = 0, L; α)/∫g(a | Y = 0, L; α)γ (Y, a, L; ψ) (a),

θ^(ψ)=arg maxθi=1nlog{f(Yi|Ai,Li;ψ,θ)}

is the profile maximum likelihood estimator of θ at a fixed ψ, f (Y | A, L; ψ, θ) = γ(Y, A, L; ψ) f (Y | A = 0, L; θ)/∫γ (y, A, L; ψ) f (y | A = 0, L; θ)(y), and ĥ(Y, A | L) [equivalent] f(Y | L; [omega with circumflex]f)g(A | L, [omega with circumflex]g). Here f(Y | L; [omega with circumflex]f) is a user-specified density when [omega with circumflex]f is chosen to be nonrandom, and f(Y | L; [omega with circumflex]f) is a user-supplied parametric model f(Y | L; ωf) for the density of Y | L evaluated at [omega with circumflex]f maximizing i=1nf(Yi|Li;ωf), otherwise. Similarly, g(A | L, [omega with circumflex]g) is a user-specified density when [omega with circumflex]g is chosen nonrandom and g(A | L, [omega with circumflex]g) is a user-supplied parametric model g(A | L; ωg) for the density of A | L, evaluated at [omega with circumflex]g maximizing i=1nf(Ai|Li;ωg), otherwise.

Theorem 2. Suppose ĥ(Y, A | L) converges in probability to an admissible independence density h(Y, A | L). Then subject to the regularity conditions given in the Appendix, under the union model B characterized by (1) and the assumption that either the model f (y | L, A = 0; θ) or g(a | Y = 0, L; α) is correct, n1/2([psi]ψ0) is regular and asymptotically linear, with influence function


and thus converges in distribution to N(0, ∑), where


with θ*(ψ) and α*(ψ) denoting the probability limits of [theta w/ hat](ψ) and [alpha](ψ), respectively, and


where C(ψ,θ)=θlogf(Y|A,L;ψ,θ) and B(ψ,α)=αlog{g(A|Y,L;ψ,α)} are the scores for θ and α, respectively.

A consistent estimator of Σ is


where [M with circumflex] is defined as M but with expectations replaced by their empirical version. Thus, E can easily be used to obtain Wald-type confidence intervals for components of ψ0.

Remark. When [omega with circumflex]f and/or [omega with circumflex]g are random, the asymptotic distribution of [psi] (d; ĥ) is equal to that of [psi] (d; h) with h = f × g = f(Y | L) × g(A | L) the probability limit of f × ĝ = f(Y | L; [omega with circumflex]f) × g(A | L, [omega with circumflex]g). In practice, it will be convenient to use an estimated density ĥ rather than a fixed choice h.

4. Local efficiency

We first consider the case in which both A and Y contain continuous components. Chen’s estimator [psi]eff solving Pn{Ŝeff (ψ)} = 0 is locally efficient in model B at the intersection submodel. However, as previously noted, the estimated efficient score Ŝeff (ψ) does not exist in closed form when ψ ≠ 0 and the alternating conditional expectations algorithm is needed to compute Ŝeff (ψ) and thus [psi]eff. In this section, we propose estimators that exploit our representation of the set (3) and thus are easier to compute than [psi]eff, and yet are nearly locally efficient, i.e. have asymptotic variance almost equal to that of [psi]eff at the intersection submodel. The first estimator [psi](dind, h^ind) is the easiest to compute, although its asymptotic variance is close to that of [psi]eff only when all components of ψ0 are close to zero; nonetheless [psi](dind, h^ind) will be useful in practice, because in many epidemiologic studies, the investigator will know from previously published results that all components of ψ0 are small. Specifically, we set dind(Y, A, L) [equivalent] [[partial differential] log{γT(Y, A, L; ψ)}/[partial differential]ψ]|ψ = 0 and h^ind (Y, A | L) = f^ind (Y|L) g^ind (A|L), with f^ind (Y |L) [equivalent] f {Y | L, A; ψ = 0,[theta w/ hat] (ψ = 0)} and g^ind (A |L) [equivalent] g{A | Y, L; ψ = 0, [alpha] (ψ = 0)}. When the true parameter ψ0 is 0 and thus A and Y are independent given L, [psi]eff and [psi](dind, h^ind) have identical limiting distributions under the union model B. This result follows from the fact that, when ψ0 = 0, Seff (ψ) = dind(Y, A, L) − dind (Y, A, L) with h(Y, A | L) equal to the true density h(Y, A | L) (Chen, 2007). By continuity, the asymptotic variances of [psi] (dind, h^ind) and [psi]eff will be close, whenever ψ0 is nearly zero.

When ψ0 is not known to be nearly zero, we adopt a general approach proposed by Newey (1993). We take a basis system ϕj (A, Y, L) (j = 1, . . .) of functions dense in L2, such as tensor products of trigonometric, wavelets or polynomial bases when the components of A, Y and L are all continuous. For some finite K > dim(ψ), we form the K -dimensional vector U(ψ; [phi with tilde]K, h) with [phi with tilde]K the vector of the first K basis functions and let ŴK (ψ) [equivalent] U{ψ, [theta w/ hat] (ψ), [alpha] (ψ); [phi with tilde]K, ĥ}, and Γ^K(ψ˜)=i=1nW^K,i(ψ˜)W^K,iT(ψ˜), where [psi] is any preliminary doubly robust estimator of ψ0. Let [psi]K,eff [equivalent] [psi]K,eff ([phi with tilde]K ĥ) be the minimizer of the quadratic form {i=1nW^K,i(ψ)}T{Γ^K(ψ˜)}{i=1nW^K,i(ψ)} with {[Gamma]K ([psi])} a generalized inverse of [Gamma]K ([psi]). Then, [psi]K,eff [equivalent] [psi]K,eff ([phi with tilde]K, ĥ) is consistent and asymptotically normal in the semiparametric union model B; furthermore, with K chosen sufficiently large, the asymptotic variance of n1/2([psi]K,effψ0) nearly attains the semiparametric efficiency bound for the union model at the intersection submodel with both nuisance models correct. In particular, the inverse of the asymptotic variance of [psi]K,eff at the intersection submodel is


where ΓK is a generalized inverse of ΓK = E{UT(ψ0; [phi with tilde]K, h)U(ψ0; [phi with tilde]K, h)}. Thus, ΩK is the variance of the population least squares regression of Sψ on the linear span of U(ψ0; [phi with tilde]K, h). By [phi with tilde]K, dense in L2, as K → ∞ the components of U(ψ0; [phi with tilde]K, h) become dense in Λnuis so that ΩKKΠ(Sψ|Λnuis)2=var(Sψ,eff), the semiparametric information bound for estimating ψ0 under model B.

Neither of these two strategies is needed if Y or A have finite support as an explicit form for the efficient score in this case was given by Bickel et al. (1993). Without loss of generality, assume Y has finite support say {y0, y1, . . ., yM−1}, with y0 = 0. In the following, we use the result obtained by Bickel et al. (1993) to construct a doubly robust locally-efficient estimating function in model B. We then demonstrate that this estimating function is in fact a particular member of the class of estimating functions in § 3. For clarity of exposition, this demonstration is restricted to the case of dichotomous Y, but can be easily extended to Y with arbitrary finite support.

Consider the vector {I (Y = y1), . . ., I (Y = yM−1)} which we again denote by Y. Next, let Ψ(A, L; ψ0) = E{[sm epsilon](ψ0)[multiply sign in circle]2 | A, L} and k [mapsto] Ũ (ψ0; k) = [k(A, L) − {k(A, L) | L; ψ0}] × [sm epsilon](ψ0) be a function that maps the space of p × M − 1 matrix functions of A and L into L2, where {k(A, L) | L; ψ0} = E{k(A, L) × Ψ(A, L; ψ0) |L} × E{Ψ(A, L; ψ0) |L}−1 and [sm epsilon](ψ0) = YE(Y | A, L; ψ0). Then, by Theorem A.4.5 of Bickel et al. (1993), the closed linear set {Ũ (ψ0; k) : k =k(A, L) unrestricted} ∩ L2 as k varies over the set of all p × (M − 1)-dimensional functions of A and L is equal to the set Λnuis for model [mathematical script A]

Furthermore, Bickel et al. (1993) show that Ũ {ψ0; keff (ψ0)} is the efficient score function of ψ in model [mathematical script A], where keff (ψ0) equals keff (ψ0) = [[partial differential] log{ρT(A, L; ψ)}/[partial differential]ψ] |ψ= ψ0, with ρ (A, L; ψ) defined to be the (M − 1) × 1 vector with the jth component equal to γ (yj, A, L, ψ), j = 1, . . ., M − 1. Robins and Rotnitzky (2001) prove that the efficient score in models [mathematical script A] and B is identical at the intersection submodel. Therefore, a doubly robust, locally efficient at the intersection submodel, estimator of ψ0 in model B is obtained by solving either i=1nU˜i{ψ;keff(ψ),θ^(ψ),α^(ψ)}=0, or i=1nU˜i{ψ;keff(ψ^mle),θ^(ψ),α^(ψ)}=0, where Ũ (ψ; keff,θ, α) is equal to the function Ũ (ψ; keff) evaluated at the law {γ (y, a, l; ψ), f (y | A = 0, l; θ), g(a | Y = 0, l; α)} and ([psi]mle, [theta w/ hat]mle, [alpha]mle) is the maximum likelihood estimator in the parametric model h(A, Y | L; ψ, α, θ) for h(A, Y | L).

We next derive a doubly robust locally-efficient estimating function U(ψ, θ, α; deff, h) in our class that equals Ũ (ψ; keff,θ, α), in the special case where Y is dichotomous. This case is of particular interest as model [mathematical script A] is then equivalent to the familiar semiparametric logistic regression model


with y1 = 1 and η(L) = log[Pr(Y = 1 | A = 0, L)/{1 − Pr(Y = 1 | A = 0, L)}] is an unrestricted function of L. Since Y is binary, any function d(Y, A, L) may be written as Ym(A, L) + n(A, L) with m(A, L) = d(1, A, L) − d(0, A, L) and n(A, L) = d(0, A, L). Given an admissible independence density h(Y, A | L) = f(Y | L)g(A | L), let r [mapsto] V (ψ0; r, h) = {r(A, L) − r(L)} × (−1)1−Y g(A | L)/ h(Y, A | L; ψ0), be a function that maps the space of p-dimensional functions of A and L into L2, where r(L) [equivalent] E{r(A, L) | L}. For a given choice of h and d(Y, A, L), U(ψ0; d, h) simplifies to V (ψ0; r, h) with r(A, L) = m(A, L) f(1 | L){1 − f(1 | L)}.

Furthermore, by


we have that


Thus, since Seff = Ũ (keff) is the efficient score, we conclude that V {ψ0; reff (h; ψ0), h} = Seff with


Therefore, the solution to either of the following estimating equations is doubly robust locally semiparametric efficient i=1nUi[ψ,θ^(ψ),α^(ψ);deff{ψ,θ^(ψ),α^(ψ),h^},h^]=0, or i=1nUi[ψ,θ^(ψ),α^(ψ);deff(ψ^mle,θ^mle,α^mle,h^),h^}=0 where deff (Y, A, L; ψ, θ, α, ĥ) = Yreff (ĥ; ψ, θ, α), [alpha] (ψ) as defined earlier and θ^(ψ)=argmaxθi=1n[Yilog{b(Ai,Li;ψ,θ)}+(1Yi)log{1b(Ai,Li;ψ,θ)}] with logit{b(A, L; ψ, θ)} = log{γ (1, A, L; ψ)} + (L; θ). More precisely, each solution is regular and asymptotically linear under model B and attains the semiparametric efficiency bound for the model at the intersection submodel.

5. Discussion

Although the common variation independent parameterization of h(A, Y | L) fL (L) under model [mathematical script A] with Y binary is (ψ, f, g, fL) with fL = fL (l), f = f (y | l, A = 0) and g = g(a | l), we instead used the parameterization of Chen (2007) that has g = g(a | Y = 0, l) rather than g = g(a | l). Our use of Chen’s parameterization was the key to our obtaining the doubly robust estimating functions for ψ and hence doubly robust estimators. Formally, following Robins and Rotnitzky (2001), a function S(ψ, f*, g*) = s(O; ψ, f*, g*) of a single subject’s data O is said to be doubly robust for ψ under a particular parameterization for model [mathematical script A] if, when either, but not necessarily both f = f* or g = g*, (i) Eψ,f,g,fL{S(ψ; f*, g*)} = 0 and varψ,f,g,fL{S(ψ, f*, g*)} < ∞ for all ψ and (ii) [partial differential][Eψ*,f,g,fL {S(ψ; f*, g*)}/[partial differential]ψ]|ψ=ψ* ≠ 0 for all ψ*, f, g, fL. Part (ii) guarantees power against local alternatives. As shown in the Appendix, U(ψ; θ, α, d, h) and Ũ (ψ; keff, θ, α) satisfy this definition under Chen’s parameterization with f*(y | l, A = 0) = f (y | l, A = 0; θ) and g*(a | Y = 0, l) = g(a | Y = 0, l; α). In contrast, no doubly robust estimating function for ψ exists under the common parameterization. In fact, the following result holds.

Theorem 3. Under the common parameterization by (ψ, f, g, fL) with fL = fL (l), f = f (l) = f (y | l, A = 0) and g = g(a | l), there does not exist a doubly robust estimating function S(ψ, f*, g*) = s(O; ψ, f*, g*) in model [mathematical script A] with Y binary characterized by the sole restriction (1).

In the Appendix, we prove this result for discrete A, thereby avoiding technicalities that arise in the continuous case.


Andrea Rotnitzky and James Robins were funded by grants from the U.S. National Institutes of Health. The authors wish to thank the reviewers for helpful comments. Andrea Rotnitzky is also affiliated with the Harvard School of Public Health.


Proof of Theorem 2. We assume that the regularity conditions of Theorem 1A of Robins et al. (1992) hold for U(ψ0; θ, α, h), C(ψ0) and B(ψ0, α) and that E[[partial differential]M{ψ, θ*(ψ), α*(ψ); d, h}/[partial differential]ψ|ψ=ψ0 ] is nonsingular. We first show that E{U(ψ0; θ*(ψ0), α*(ψ0), h)} = 0 when the data are generated under either f (Y | A, L; ψ00) or f (A | Y, L; ψ0, α0). By symmetry, it is enough to consider the case where the data were generated under f (Y | A, L; ψ00). Under standard conditions guaranteeing the consistency of the maximum likelihood estimator, θ*(ψ0) = θ0. Now, under f (Y | A, L; ψ0, θ0),




Then, under the assumed regularity conditions the formulae (4) and (5) follow from standard Taylor series arguments, whenever E[[partial differential]M{ψ, θ*(ψ), α*(ψ); d, h}/[partial differential]ψ]|ψ = ψ0 is nonsingular. The asymptotic normality result follows from the standard application of Slutsky’s theorem and the central limit theorem.

Proof of Theorem 3. The proof is by contradiction: if S(ψ, f*, g*) were doubly robust, then, for every f*, S(ψ, f*, g*) would be an unbiased estimating function for ψ with power against local alternatives in the submodel [mathematical script A]g* of model [mathematical script A] in which g = g* is known a priori. Hence, it suffices to prove that model [mathematical script A]g* does not admit such unbiased estimating functions. Noting that model [mathematical script A]g* can be parameterized by (ψ, f, fL), we need to prove there is no function Q(ψ) = q(O; ψ) such that Eψ,f,fL{Q(ψ)} = 0, varψ,f,fL{Q(ψ)} < ∞ and [partial differential][Eψ*,f,fL{Q(ψ)}]/[partial differential]ψ|ψ = ψ* ≠ 0 for all ψ*, f, fL. Now Bickel et al. (2003) proved that an unbiased estimating function for a parameter ψ lies in the orthocomplement to the nuisance tangent space for the model. For model [mathematical script A]g*, it is straightforward to show that the orthocomplement to the nuisance tangent space at law (ψ, f, fL), say Λnuis,g* (ψ, f), does not depend on fL and is the direct sum of the orthocomplement for model [mathematical script A] plus the space of functions υ = υ(A, L) of (A, L) with zero-mean given L. Thus,

Λnuis,g*(ψ,f)=[T(k,υ;ψ,f)=U˜(ψ,f;k)+υ(A,L);k unrestricted υ with E{υ(A,L)|L}=0]L2,

where Ũ (ψ, f ; k) = [sm epsilon](ψ, f)[k(A, L) − {k(A, L) | L; ψ, f}] is Ũ (ψ; k) defined in § 4 with the dependence on f now made explicit. Suppose Q(ψ) existed. Then Q(ψ) = T (kf, υf; ψ, f) [set membership] Λnuis,g* (ψ, f) holds for each f, where kf and υf are the particular functions k and υ associated with a given f. Thus, T (kf, υf; ψ, f) = T (kf*, υf*; ψ, f*) for any f, f*. Noting [sm epsilon](ψ, f) [equivalent] Y − pr(Y = 1 | A, L; ψ, f), the previous equality implies that Y {kf (A,L) − {kf (A, L) | L; ψ, f} − kf* (A, L) − {kf* (A, L) | L; ψ, f*}] is equal to a function that does not depend on Y. Hence, it must be that


Thus, for a function r(L), kf* (A, L) = kf (A, L) + r(L). Substituting for k f* (A, L) in the last display we obtain kf (A, L) − {kf (A, L) | L; ψ, f} = kf (A, L) + r(L) − {kf (A,L) | L; ψ, f*} − r(L) and hence {kf (A,L) | L; ψ, f} = {kf (A,L) | L; ψ, f*} with probability one for all f, f*, ψ which, as shown in the next paragraph, implies kf (A,L) is not a function of A, i.e. kf (A, L) = k(L) with probability one for some k(L). But kf (A, L) not a function of A implies [partial differential][Eψ*, f, fL {Q(ψ)]/[partial differential]ψ |ψ=ψ* = 0, which is a contradiction. We conclude that no unbiased estimating function Q(ψ) with power against local alternatives exists. We show that, for A binary, {h(A, L) | L; ψ, f} depends on f on a set of nonzero probability whenever the conditional odds ratio function γ (1, 1, L; ψ) ≠ 1 with probability one, and h(A, L) = h1(L) A + h0(L) depends on A, i.e. whenever h1(L) is nonzero with positive probability. Let f (l) denote f (1 | A = 0, l). When γ (1, 1, L; ψ) ≠ 1 with probability one, [A | L; ψ, f] = [A | L; ψ, f ] = [1 + {g(A = 0 | L)/g(A = 1 | L)}{1 − f (L) + γ (1, 1, L) f (L)}2 (1, 1, L)]−1 obviously depends on f (L) with probability one which implies {h(A, L) | L; ψ, f} depends on f on the set where h1(L) is nonzero. The proof for arbitrary discrete A is identical except that extra bookkeeping is required.


  • Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer; 1993.
  • Chen HY. A note on prospective analysis of outcome-dependent samples. J. R. Statist. Soc. B. 2003;65:575–84.
  • Chen HY. Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Statist Assoc. 2004;99:1176–89.
  • Chen HY. A semiparametric odds ratio model for measuring association. Biometrics. 2007;63:413–21. [PubMed]
  • Newey W. Efficient estimation of models with conditional moment restrictions. In: Maddala GS, Rao CR, Vinod H, editors. Handbook of Statistics, IV. Amsterdam: Elsevier Science; 1993. pp. 427–61.
  • Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–95. [PubMed]
  • Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, ‘Inference for semiparametric models: some questions and an answer’ Statist. Sinica. 2001;11:920–36.
  • Vansteelandt S, VanderWeele T, Tchetgen EJ, Robins JM. Semiparametric inference for statistical interactions. J Am Statist Assoc. 2008;103:1693–704. [PMC free article] [PubMed]

Articles from Biometrika are provided here courtesy of Oxford University Press