Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3491912

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. THE GENERALIZED NONPARAMETRIC MODEL WITH MISSING OUTCOMES
- 3. THE KERNEL ESTIMATING EQUATIONS FOR MISSING OUTCOMES AT RANDOM
- 4. ASYMPTOTIC PROPERTIES
- 5. SIMULATIONS
- 6. APPLICATION TO ACSUS DATA
- 7. DISCUSSION
- Supplementary Material
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2012 November 7.

Published in final edited form as:

J Am Stat Assoc. 2010 September; 105(491): 1135–1146.

Published online 2012 January 1. doi: 10.1198/jasa.2010.tm08463PMCID: PMC3491912

NIHMSID: NIHMS404700

Lu Wang, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109;

We consider nonparametric regression of a scalar outcome on a covariate when the outcome is missing at random (MAR) given the covariate and other observed auxiliary variables. We propose a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. We show that AIPW kernel estimators are consistent when the probability that the outcome is observed, that is, the selection probability, is either known by design or estimated under a correctly specified model. In addition, we show that a specific AIPW kernel estimator in our class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust, that is, it remains consistent if this model is correctly specified even if the selection probabilities are modeled or specified incorrectly. Furthermore, when both models happen to be right, this double-robust estimator attains the smallest possible asymptotic variance of all AIPW kernel estimators and maximally extracts the information in the auxiliary variables. We also describe a simple correction to the AIPW kernel estimating equations that while preserving double-robustness it ensures efficiency improvement over nonaugmented IPW estimation when the selection model is correctly specified regardless of the validity of the second model used in the augmentation term. We perform simulations to evaluate the finite sample performance of the proposed estimators, and apply the methods to the analysis of the AIDS Costs and Services Utilization Survey data. Technical proofs are available online.

The existing missing data literature mainly focuses on estimation methods in parametric regression models, that is, models for the conditional mean of an outcome given covariates indexed by finite dimensional regression parameters. However, the functional form of the dependence of an outcome on a covariate is often unknown in advance and can be complicated (Hastie and Tibshirani 1990; Wand and Jones 1995). For example, Zhang, Lin, and Sowers (2000) found that the profile of progesterone level during a menstrual cycle follows a nonlinear pattern which is hard to fit using standard parametric models and is best fitted by nonparametric smoothing techniques. Likewise, Harezlak et al. (2007) found that the protein intensities from mass spectrometry are very complex and need to be fit using nonparametric smoothing methods. Limited literature is available for nonparametric regression in the presence of missing data.

Our work is motivated by the AIDS Costs and Services Utilization Survey (ACSUS) (Berk, Maffeo, and Schur 1993). The ACSUS sampled subjects with AIDS in 10 randomly selected United States cities with the highest AIDS rates. A question of interest in this study is how the risk of hospital admission one year after study enrollment is related to the baseline CD4 counts. Although it is known that a lower CD4 count is associated with a higher risk of hospitalization, the functional form of dependence is unknown and expected to be nonlinear with a potential threshold. We are hence interested in modeling this relationship nonparametrically. However, about 40% of the patients did not have the first-year hospital admission data available. As shown in Section 4, naive nonparametric regression using complete data could only yield an inconsistent estimator of the mean curve if the missing is not completely at random, a likely situation in this problem. It is therefore of interest to develop flexible nonparametric regression methods to estimate the effect of baseline CD4 counts on the risk of hospitalization that adequately adjust for outcomes missing at random (MAR), that is, missing depending on observed data (Little and Rubin 2002). In addition, because the fraction of missing outcomes is large, it is also important that the methodology maximally exploits the information in available auxiliary variables. The methods we develop in this paper are also useful for nonparametric regression estimation in two-stage studies (Pepe 1992), where the second-stage outcome is not observed for all study units and the probability of observing the outcome depends on the first-stage auxiliaries and covariates, but is independent of the outcome, that is, it is MAR.

Limited work has been done on nonparametric regression in the presence of missing data. Wang et al. (1998) considered estimation of a nonparametric regression curve with missing covariates. Liang et al. (2004) considered estimation of a partially linear model with missing covariates and described inverse probability weighted (IPW) estimation of the nonparametric component of the model. Chen et al. (2006) studied local quasi-likelihood estimation with missing outcomes when missingness depends only on the regression covariate. None of these articles considered, as we do here, the possibility that always observed auxiliaries are available, a case that arises often in practice. Our work differs in that we propose augmented inverse probability weighted (AIPW) kernel estimators that exploit the information in the auxiliary variables while at the same time allowing for the possibility that missingness may depend on them, thus making the MAR assumption more plausible.

In this paper we generalize kernel estimating equation methods (Wand and Jones 1995; Fan and Gibjels 1996; Carroll, Ruppert, and Welsh 1998) to accommodate outcomes missing at random in a similar spirit to IPW and AIPW methods for parametric regression (Robins, Rotnitzky, and Zhao 1994, 1995; Rotnitzky and Robins 1995; Rotnitzky, Holcroft, and Robins 1997; Robins 1999). After studying the properties of naive kernel estimating equations based on complete cases, we propose the IPW kernel estimating equations and a class of AIPW kernel estimating equations. We present the asymptotic properties of the solutions to these weighted kernel estimating equations and compare them in terms of asymptotic biases and variances. We argue that clever choices of the augmentation term can yield important efficiency gains over the IPW kernel estimators. The proposed IPW and AIPW kernel estimators are consistent under MAR if the missingness mechanism is known by design or can be parametrically modeled. Indeed, with one specific choice of the augmentation term, the AIPW kernel estimator confers some protection against model misspecification in that it remains consistent even if the model for the missingness probabilities is misspecified provided that a parametric model for the conditional mean of the outcome given the covariates and auxiliaries is correctly specified, a property known as double-robustness.

We consider a generalized nonparametric mean model when the outcome may be missing at random. Specifically, suppose the study design calls for a vector of variables (*Y _{i}*,

$$g({\mu}_{i})=\theta ({Z}_{i}),$$

(1)

where *g*(·) is a known monotonic link function (McCullagh and Nelder 1989) with a continuous first derivative, *μ _{i}* =

We assume that outcomes are MAR (Little and Rubin 2002), which in our setting amounts to assuming that

$$Pr({R}_{i}=1\mid {Z}_{i},{\mathbf{U}}_{i},{Y}_{i})=Pr({R}_{i}=1\mid {Z}_{i},{\mathbf{U}}_{i}),$$

(2)

where *R _{i}* = 1 if

In the absence of missing data, local polynomial kernel estimating equations have been proposed by Carroll, Ruppert, and Welsh (1998) as an extension of local likelihood estimation. When the data are not fully observed, one naive estimation approach is to simply solve the local polynomial kernel estimating equations using only completely observed units. However, as we show in Theorem 1 in Section 4, the resulting estimator _{naive}(*z*) is generally inconsistent under MAR, except when: (a) the conditional mean of *E*(*Y*|*Z*, **U**) depends at most on *Z* or, (b) the selection probability Pr(*R* = 1|*Z*, **U**) depends at most on *Z*. This result is not surprising once we connect our inferential problem to causal inference objectives and relate it to well-known facts in causality. The MAR assumption (2) is equivalent to the assumption of no unmeasured confounding (Robins et al. 1999) or ignorability (Rubin 1976) for the potential outcome under treatment *R* = 1 in the subpopulation with *Z* = *z*. This assumption stipulates that, conditional on *Z* = *z*, **U** are the only variables that can simultaneously be (i) correlates of the outcome within treatment level and (ii) predictors of treatment *R* = 1. When (a) or (b) holds, either (i) or (ii) is violated. In such case, the effect of *R* = 1 on *Y* is unconfounded and consequently naive conventional, that is, unadjusted, estimators of the association of *Y* with *R* = 1 conditional on *Z* = *z* are consistent estimators of the causal estimand of interest. In fact, when (b) holds but (a) is false, the naive estimator will be consistent but inefficient because it fails to exploit the information about *E*(*Y*|*Z* = *z*) in the auxiliary variables **U**. Thus, even in such setting it is desirable to develop alternative, more efficient, estimation procedures. The Augmented Inverse Probability Weighted (AIPW) kernel estimators developed in this paper address this issue.

When the outcomes are missing at random, Robins, Rotnitzky, and Zhao (1995) and Rotnitzky and Robins (1995) proposed an inverse probability weighted (IPW) estimating equation for parametric regression, that is, when *θ*(·) is parametrically modeled as *θ*(·; ** ν**) indexed by a finite dimensional parameter vector

Specifically, let *K _{h}*(

$${\pi}_{i0}=\pi ({Z}_{i},{\mathbf{U}}_{i};\mathit{\tau}),$$

(3)

where *π*(*Z*, **U**; ** τ**) is a known smooth function of an unknown finite dimensional parameter vector

$$\sum _{i=1}^{n}\{{U}_{\text{IPW},i}(\alpha )-{A}_{i}(\alpha )\}=0,$$

(4)

where

$$\begin{array}{l}{U}_{\text{IPW},i(\alpha )}=\frac{{R}_{i}}{{\widehat{\pi}}_{i}}{K}_{h}({Z}_{i}-z){\mu}_{i}^{(1)}{V}_{i}^{-1}\mathbf{G}({Z}_{i}-z)\times [{Y}_{i}-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]\\ {A}_{i}(\alpha )=\left(\frac{{R}_{i}}{{\widehat{\pi}}_{i}}-1\right){K}_{h}({Z}_{i}-z){\mu}_{i}^{(1)}{V}_{i}^{-1}\mathbf{G}({Z}_{i}-z)\times [\delta ({Z}_{i},{\mathbf{U}}_{i})-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]\end{array}$$

(5)

with
${\mu}_{i}^{(1)}$ is the first derivative of *μ*(·) evaluated at **G**(*Z _{i}* −

In the AIPW kernel estimating equations (4), the term *U*_{IPW,}* _{i}*(

Two key properties, formally proved in Section 4, make the AIPW kernel estimating equation methodology appealing, namely: (1) exploitation of the information in the auxiliary variables of subjects with missing outcomes and (2) double robustness.

Informally, property (1) is seen because both the subjects with complete data and those with missing outcomes in a local neighborhood of *Z* = *z* have a nonnegligible contribution to the AIPW kernel estimating equations. Consider the alternative IPW kernel estimator _{IPW}(*z*), which is obtained by simply solving the IPW kernel estimating equations Σ_{i} U_{IPW,}* _{i}*(

To construct AIPW estimators with property (2), the double-robustness, we specify a parametric model

$$E({Y}_{i}\mid {Z}_{i},{\mathbf{U}}_{i})=\delta ({Z}_{i},{\mathbf{U}}_{i};\mathit{\eta}),$$

(6)

where ** η** is an unknown finite dimensional parameter vector, and we estimate

In addition, in Section 4 we show that the preceding double-robust estimator _{AIPW}(*z*) has an additional desirable property. Specifically, if model (6) is correctly specified then the double-robust estimator _{AIPW}(*z*) has the smallest asymptotic variance among all estimators solving AIPW kernel estimating equations with *π _{i}*

Our estimators _{AIPW}(*z*) use the IPW method of moments estimator of the variance parameter ** ζ**. Although one could construct an AIPW method of moments estimator of

In this section, we investigate the asymptotic properties of the AIPW local linear kernel estimator introduced in the preceding section and compare it with the naive and IPW nonparametric estimators. In our developments we make the following assumptions: (I) *n* → ∞, *h* → 0, and *nh* → ∞; (II) *z* is in the interior of the support of *Z*; and (III) the regularity conditions (i) and (ii) stated at the beginning of the web Appendix hold.

Denote by _{naive}(*z*), _{IPW}(*z*), _{AIPW}(*z*) the asymptotic limits of _{naive}(*z*), _{IPW}(*z*), _{AIPW}(*z*). The AIPW kernel estimator _{AIPW}(*z*) solves (4). The IPW kernel estimator _{IPW}(*z*) solves
${\sum}_{i=1}^{n}{U}_{\text{IPW},i}(\mathit{\alpha})=0$, where *U*_{IPW,}* _{i}*(

$$E[R{\mu}^{(1)}\{{\stackrel{\sim}{\theta}}_{\text{native}}(z)\}{V}^{-1}\{{\stackrel{\sim}{\theta}}_{\text{native}}(z);\stackrel{\sim}{\mathit{\zeta}}\}\times [Y-\mu \{{\stackrel{\sim}{\theta}}_{\text{native}}(z)\}]\mid Z=z]=0,$$

(7)

where ** is the probability limit of ****.**

Likewise, the IPW kernel estimating equations should have a sequence of solutions (_{0,IPW}, _{1,IPW}) that converge in probability to a vector (_{0,IPW}, _{1,IPW}) with the first component _{0,IPW}, throughout denoted as _{IPW}(*z*), satisfying

$$E\left[\frac{R}{\stackrel{\sim}{\pi}}{\mu}^{(1)}\{{\stackrel{\sim}{\theta}}_{\text{IPW}}(z)\}{V}^{-1}\{{\stackrel{\sim}{\theta}}_{\text{IPW}}(z);\stackrel{\sim}{\mathit{\zeta}}\}\times [Y-\mu \{{\stackrel{\sim}{\theta}}_{\text{IPW}}(z)\}]|Z=z\right]=0,$$

(8)

where *π̃* = *π*(*Z*, **U**; **), and **** is the probability limit of ****.**

Similarly, the AIPW kernel estimating Equations (4) should have a sequence of solutions (_{0,AIPW}, _{1,AIPW}) that converge in probability to a vector (_{0,AIPW}, _{1,AIPW}) with the first component _{0,AIPW}, throughout denoted as _{AIPW}(*z*), satisfying

$$\begin{array}{l}E\left[\frac{R}{\stackrel{\sim}{\pi}}{\mu}^{(1)}\{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z)\}{V}^{-1}\{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z);\stackrel{\sim}{\mathit{\zeta}}\}\times [Y-\mu \{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z)\}]|Z=z\right]\\ +E\left\{\left(\frac{R}{\stackrel{\sim}{\pi}}-1\right){\mu}^{(1)}\{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z)\}{V}^{-1}\{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z);\stackrel{\sim}{\mathit{\zeta}}\}\times [\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{{\stackrel{\sim}{\theta}}_{\text{AIPW}}(z)\}]|Z=z\right\}=0,\end{array}$$

(9)

where (*Z*, **U**) = *δ*(*Z*, **U**; **), and **** is the probability limit of ****.**

Throughout we assume that such sequences exist. Theorem 1 exploits the form of (7), (8), and (9) to derive concise expressions for the probability limits of _{naive}(*z*), _{IPW}(*z*), and _{AIPW}(*z*) under MAR.

Under the MAR assumption (2), the following results hold:

- The probability limit
_{naive}(*z*) of the naive kernel estimator defined in (7) satisfies_{naive}(*z*) =*μ*^{−1}[*μ*{*θ*(*z*)} + cov(*R*,*Y*|*Z*=*z*)/*E*(*R*|*Z*=*z*)]; - The probability limit
_{AIPW}(*z*) of the AIPW kernel estimator defined in (9) satisfies_{AIPW}(*z*) =*θ*(*z*) when the AIPW kernel estimating equations (4) use either (i) the true*π*_{i}_{0}orcomputed under a correctly specified model (3); or (ii)_{i}*δ*(*Z*,**U**) =*E*(*Y*|*Z*,**U**), or*δ*(*Z*,**U**) =*δ*(*Z*,**U**;**) with****calculated under a correctly specified model (6).**

The proof of Theorem 1 is given in web Appendix A.1. It follows from Theorem 1 that _{naive}(*z*) is generally inconsistent for *θ*(*z*) except when *R* and *Y* are conditionally uncorrelated given *Z*. In particular, this implies that when missingness depends on variables **U** other than *Z* which further predict *Y*, _{naive}(*z*) is inconsistent. However, if either of the following two conditions hold, then cov(*R*, *Y*|*Z* = *z*) = 0 and therefore _{naive}(*z*) is consistent for *θ*(*z*). Specifically:

*Condition a.*The missing indicator*R*depends on the covariate*Z*but given*Z*it is conditionally independent of auxiliary variables**U**.*Condition b.*The conditional mean of*Y*given*Z*and**U**depends only on*Z*.

Theorem 1, part (III) shows that the AIPW kernel estimator _{AIPW}(*z*) has the remarkable double-robustness property alluded to in the preceding section: its consistency requires the correct specification of either a model for *π _{i}*

In what follows, we study the asymptotic distributions of the proposed estimators. Theorem 2 and Theorem 3 provide the asymptotic bias and variance of _{IPW}(*z*) and _{AIPW}(*z*), respectively, under MAR. Corollaries following these theorems show that in the class of AIPW kernel estimating equations that use either the true *π _{i}*

Suppose * _{i}* is computed under a correctly specified model (3) or is replaced by its true value. Suppose Pr(

$$\sqrt{nh}\{{\widehat{\theta}}_{\text{IPW}}(z)-\theta (z)-\frac{1}{2}{h}^{2}{\theta}^{\u2033}(z){c}_{2}(K)+o({h}^{2})\}\to \text{N}\{0,\phantom{\rule{0.16667em}{0ex}}{W}_{\text{IPW}}(z)\},$$

(10)

where

$$\begin{array}{l}{W}_{\text{IPW}}(z)\equiv {b}_{K}(z)E\phantom{\rule{0.16667em}{0ex}}\left[{\left[\frac{R}{{\pi}_{0}(Z,\mathbf{U})}(Y-\mu \{\theta (Z)\})\right]}^{2}|Z=z\right]\\ ={b}_{K}(z)E\phantom{\rule{0.16667em}{0ex}}\left[\frac{var(Y\mid Z,\mathbf{U})+{[E(Y\mid Z,\mathbf{U})-\mu \{\theta (Z)\}]}^{2}}{{\pi}_{0}(Z,\mathbf{U})}|Z=z\right].\end{array}$$

Theorem 2 shows that the asymptotic bias of _{IPW}(*z*) is of order *O*(*h*^{2}), and the variance of _{IPW}(*z*) is of order *O*(1/*nh*) and does not depend on the working variance *V*(·) in the IPW kernel estimating equations. This result indicates that, in contrast to parametric regression estimation, misspecification of the working variance *V*(·) of *Y*|*Z* does not affect the asymptotic variance of _{IPW}(*z*). Theorem 2 also shows that to this order the bias and variance do not depend on whether the selection probabilities are known or estimated parametrically.

Suppose that in the AIPW kernel estimating equations (4), (a) * _{i}* is computed under a model (3) or it is replaced by fixed probabilities
${\pi}_{i}^{\ast}\equiv {\pi}^{\ast}({Z}_{i},{\mathbf{U}}_{i})$ and (b)

If either (i) or (ii) (but not necessarily both) hold, then

$$\sqrt{nh}\{{\widehat{\theta}}_{\text{AIPW}}(z)-\theta (z)-\frac{1}{2}{h}^{2}{\theta}^{\u2033}(z){c}_{2}(K)+o({h}^{2})\}\to \text{N}\{0,\phantom{\rule{0.16667em}{0ex}}{W}_{\text{AIPW}}(z)\},$$

(11)

where

$${W}_{\text{AIPW}}(z)={b}_{K}(z)E\phantom{\rule{0.16667em}{0ex}}\left[{\left[\frac{R}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}(Y-\mu \{\theta (Z)\})-\left(\frac{R}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}-1\right)\phantom{\rule{0.16667em}{0ex}}(\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{\theta (z)\})\right]}^{2}|Z=z\right],$$

(12)

*π̃* (*Z*, **U**) denotes *π*^{*}(*Z*, **U**) if
${\pi}_{i}^{\ast}$ is used, or it denotes the probability limit of (*Z*, **U**) if * _{i}* is used, and (

Theorem 3 shows that the leading term of the asymptotic bias of _{AIPW}(*z*) is the same as that of _{IPW}(*z*) when the model for the selection probability is correctly specified. Furthermore, it remains the same even when the model for the selection probability is wrong, as long as the model for the conditional mean of the outcome given covariates and auxiliaries is correctly specified. Display (12) provides the general form of the asymptotic variance of _{AIPW}(*z*) when either model (3) or model (6) is correctly specified. If model (6) is correctly specified, then (12) simplifies to *b _{K}* (

On the other hand, if model (3) for the selection probability is correctly specified, the following corollary explores the properties of *W*_{AIPW}(*z*) and it establishes that among the AIPW kernel estimating equations, the one that uses *δ*(*Z _{i}*,

Under the assumptions of Theorem 3, if the selection probability model (3) is correctly specified, then

$${W}_{\text{AIPW}}(z)={b}_{K}(z)E\left[\frac{1}{{\pi}_{0}(Z,\mathbf{U})}var(Y\mid Z,\mathbf{U})+{[E(Y\mid Z,\mathbf{U})-\mu \{\theta (Z)\}]}^{2}+\left(\frac{1}{{\pi}_{0}(z,\mathbf{U})}-1\right)\times {\{E(Y\mid Z,\mathbf{U})-\stackrel{\sim}{\delta}(Z,\mathbf{U})\}}^{2}|Z=z\right].$$

(13)

*W*_{AIPW}(*z*) is minimized at (*Z*, **U**) = *E*(*Y*|*Z*, **U**). Consequently, when model (3) is correct, the estimator _{AIPW}(*z*) that uses *δ*(*Z*, **U**) = *δ*(*Z*, **U**; **) from a correctly specified model for ***E*(*Y*|*Z*, **U**), throughout denoted as _{opt,}* _{AIPW}* (

$${W}_{\text{opt},\text{AIPW}(z)}={b}_{K}(z)E\left\{\frac{var(Y\mid Z,\mathbf{U})}{{\pi}_{0}(Z,\mathbf{U})}+{[E(Y\mid Z,\mathbf{U})-\mu \{\theta (Z)\}]}^{2}|Z=z\right\}.$$

Note that it follows from (13) that *W*_{AIPW}(*z*) agrees with *W*_{IPW}(*z*) when (*Z*, **U**) = *μ*{*θ*(*Z*)}. This implies that, under correct specification of the selection probability model, the AIPW estimators that use *δ*(*Z*, **U**) equal to the fitted value *δ*(*Z*; **) from a parametric model ***δ*(*Z; ω*) for *E*(*Y*|*Z*), rather than the fitted value from a parametric model for *E*(*Y*|*Z*, **U**), are asymptotically equivalent to IPW estimators.

A direct comparison of the asymptotic variance of _{opt,AIPW}(*z*) to that of _{IPW}(*z*) in Theorem 2 immediately gives that the optimal AIPW kernel estimator is always at least as efficient as the IPW kernel estimator when indeed model (6) is correctly specified, as the next corollary establishes.

Suppose that _{opt,AIPW}(*z*) and _{IPW}(*z*) solve, respectively, the optimal AIPW and IPW kernel estimating equations that use the true *π _{i}*

$${W}_{\text{IPW}}(z)-{W}_{\text{opt},\text{AIPW}}(z)={b}_{K}(z)E\left[\left(\frac{1}{{\pi}_{0}(Z,\mathbf{U})}-1\right)\times {[E(Y\mid Z,\mathbf{U})-\mu \{\theta (Z)\}]}^{2}|Z=z\right].$$

When Pr[*π*_{0}(*Z*, **U**) < 1] > 0, the difference *W*_{IPW}(*z*) −*W*_{opt,AIPW}(*z*) is 0 only when *E*(*Y*|*Z* = *z*, **U**) − *E*(*Y*|*Z* = *z*) = 0, that is, when **U** does not predict *Y* in addition to *Z*. When **U** predicts *Y* above and beyond *Z*, as is expected for covariates **U** usually recorded in epidemiological studies, *W*_{IPW}(*z*) − *W*_{opt,AIPW}(*z*) is strictly positive. Thus _{opt,AIPW}(*z*) is usually more efficient than _{IPW}(*z*).

A warning is appropriate at this stage. Our results show that using the optimal augmentation term we improve upon the efficiency of the IPW estimator. However, it is not guaranteed that any augmentation term in the AIPW kernel estimating equation leads to efficiency gains over the IPW method. In practice, one often does not know whether model (6) is correct, and hence is uncertain that _{AIPW}(*z*) is more efficient than _{IPW}(*z*). Nevertheless we can follow a strategy proposed by Tan (2006) for estimation of the marginal mean of an outcome and remedy this problem. Specifically, the following simple modification results in an AIPW kernel estimating function that yields double-robust estimators guaranteed to be at least as efficient as the IPW estimator _{IPW}(*z*) and as the optimal AIPW estimator _{opt,AIPW}(*z*) when model (3) holds for the selection probability. Let
${M}_{1i}(\alpha )={R}_{i}{\widehat{\pi}}_{i}^{-1}{V}_{i}^{-1}{K}_{h}({Z}_{i}-z)[{Y}_{i}-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}],{M}_{2i}(\alpha )=({R}_{i}{\widehat{\pi}}_{i}^{-1}-1){V}_{i}^{-1}{K}_{h}({Z}_{i}-z)[\delta ({Z}_{i},{\mathbf{U}}_{i})-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}],{M}_{3i}(\alpha )={R}_{i}{\widehat{\pi}}_{i}^{-1}({\widehat{\pi}}_{i}^{-1}-1){V}_{i}^{-2}{K}_{h}{({Z}_{i}-z)}^{2}{[\delta ({Z}_{i},{\mathbf{U}}_{i})-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]}^{2}$ and
$\widehat{\kappa}(\mathit{\alpha})=\{{\sum}_{i=1}^{n}{M}_{1i}(\alpha ){M}_{2i}(\alpha )\}/\{{\sum}_{i=1}^{n}{M}_{3i}(\alpha )\}$. Let _{mod} = {_{mod,0}, _{mod,1}} solve

$$\sum _{i=1}^{n}\{\frac{{R}_{i}}{{\widehat{\pi}}_{i}}{K}_{h}({Z}_{i}-z){\mu}_{i}^{(1)}{V}_{i}^{-1}\mathbf{G}({Z}_{i}-z)[{Y}_{i}-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]-\widehat{\kappa}(\mathit{\alpha})\left(\frac{{R}_{i}}{{\widehat{\pi}}_{i}}-1\right){K}_{h}({Z}_{i}-z){\mu}_{i}^{(1)}{V}_{i}^{-1}\mathbf{G}({Z}_{i}-z)\times [\delta ({Z}_{i},{\mathbf{U}}_{i}-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]\}=0,$$

(14)

where
${V}_{i}^{-1}$ is evaluated at **. The proposed modified estimator is **_{mod}(*z*) = _{mod,0}. Note that (14) is just like the AIPW equation (4) except that the contribution to the augmentation term of each subject is multiplied by the factor (** α**). Remarkably, this modification ensures that the new estimator

$$\kappa =E\left[\frac{R}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}(Y-\mu \{\theta (Z)\})\left(\frac{R}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}-1\right)\times (\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{\theta (Z)\})|Z=z\right]/E\left[\frac{R}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}\left(\frac{1}{\stackrel{\sim}{\pi}(Z,\mathbf{U})}-1\right)\times {(\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{\theta (Z)\})}^{2}|Z=z\right].$$

When model (3) is correct, then *π̃*(*Z*, **U**) = *π*_{0}(*Z*, **U**) and the second term of the left-hand side of (9) is zero, regardless of whether it is evaluated at the true *θ*(*z*) or not and regardless whether or not it is multiplied by the constant *κ* while the first term is unaffected by the modification and remains equal to zero when evaluated at *θ*(*z*). Thus _{mod}(*z*) is consistent for *θ*(*z*) when model (3) is correctly specified. On the other hand, when model (6) is correct, then (*Z*, **U**) = *E*(*Y*|*Z*, **U**) and a straightforward calculation shows that *κ* = 1 regardless of whether or not *π̃*(*Z*, **U**) is equal to Pr(*R* = 1|*Z*, **U**), thus implying that _{mod}(*z*) is consistent for *θ*(*z*) since, as we argued earlier, *θ*(*z*) solves Equation (9). This shows that _{mod}(*z*) is double-robust. To show that _{mod}(*z*) is at least as efficient as _{opt,AIPW}(*z*) and as _{IPW}(*z*) when model (3) is correctly specified, we can argue as in the proof of Theorem 3 and show that _{mod}(*z*) has the same limiting distribution as _{AIPW}(*z*), except that the asymptotic variance *W*_{AIPW}(*z*) is replaced by

$${W}_{mod}(z)={b}_{K}(z)E\left[{\left\{\frac{R}{{\pi}_{0}(Z,\mathbf{U})}[Y-\mu \{\theta (Z)\}]-\kappa \times \left(\frac{R}{{\pi}_{0}(Z,\mathbf{U})}-1\right)[\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{\theta (Z)\}]\right\}}^{2}|Z=z\right].$$

A straightforward calculation yields that the denominator of *κ* is equal to

$$E\left[{\left\{\frac{R}{{\pi}_{0}(Z,\mathbf{U})}-1\right\}}^{2}{[\stackrel{\sim}{\delta}(Z,\mathbf{U})-\mu \{\theta (Z)\}]}^{2}|Z=z\right].$$

Thus, *W*_{mod}(*z*) is equal to *b _{K}* (

Choosing an appropriate bandwidth parameter *h* is important in nonparametric regression. From Theorems 2 and 3, the asymptotic optimal bandwidths *h*_{IPW,opt} and *h*_{AIPW,opt} can be chosen by minimizing the corresponding asymptotic weighted mean integrated squared errors, respectively. Specifically, the asymptotically optimal bandwidth for estimating _{IPW}(*z*) is given by *h*_{IPW,opt} = [{4 ∫*W*_{IPW}(*z*) *dz*}/{*c*_{2}(*K*)∫*θ*″(*z*) *dz*}]^{1/5}*n*^{−1/5} and the asymptotically optimal bandwidth for estimating _{AIPW}(*z*) is given by *h*_{AIPW,opt} = [{4∫*W*_{AIPW}(*z*) *dz*}/{*c*_{2}(*K*)∫*θ*″(*z*)*dz*}]^{1/5}*n*^{−1/5}.

To choose *h* in practice, we can easily generalize the empirical bias bandwidth selection (EBBS) method of Ruppert (1997) to derive a data-driven bandwidth selection approach for non-parametric regression with missing data. Specifically, one calculates the empirical mean squared errors EMSE{*z; h*(*z*)} of (*z*), where
$\text{EMSE}\{z;h(z)\}=\widehat{\text{bias}}{\{\widehat{\theta}(z)\}}^{2}+\widehat{var}\{\widehat{\theta}(z)\}$, at a series of *z* and *h*(*z*) and chooses *h*(*z*) to minimize EMSE{*z; h*(*z*)}. Note *h*(*z*) is choosen to vary with *z*, and thus is local. Here
$\widehat{\text{bias}}\{\widehat{\theta}(z)\}$ is the empirical bias, and
$\widehat{\text{var}}\{\widehat{\theta}(z)\}$ is the Sandwich variance estimator. For example, the Sandwich variance estimator of the IPW kernel estimator _{IPW}(*z*) can be calculated as the (1, 1) element of the matrix (**A**_{IPW})^{−1}**B**_{IPW}(**A**_{IPW})^{−1}, where

$${\mathbf{B}}_{\text{IPW}}=\frac{1}{n}\sum _{i=1}^{n}{\left\{\frac{{R}_{i}}{{\widehat{\pi}}_{i}}{K}_{h}({Z}_{i}-z){\mu}_{i}^{(1)}{V}_{i}^{-1}[{Y}_{i}-\mu \{\mathbf{G}{({Z}_{i}-z)}^{T}\mathit{\alpha}\}]\right\}}^{2}\times \mathbf{G}({Z}_{i}-z)\mathbf{G}{({Z}_{i}-z)}^{T}$$

and

$${\mathbf{A}}_{\text{IPW}}=\frac{1}{n}\sum _{i=1}^{n}\frac{{R}_{i}}{{\widehat{\pi}}_{i}}{K}_{h}({Z}_{i}-z){\{{\mu}_{i}^{(1)}\}}^{2}{V}_{i}^{-1}\mathbf{G}({Z}_{i}-z)\mathbf{G}{({Z}_{i}-z)}^{T}.$$

The Sandwich variance estimator of the naive kernel estimator _{naive}(*z*), and of the AIPW kernel estimator _{AIPW}(*z*) can be constructed in a similar way.

In this section, we conduct simulation studies to evaluate the finite-sample performance of the AIPW kernel estimator _{AIPW}(*z*), and compare it with the naive kernel estimator _{naive}(*z*) and the IPW kernel estimator _{IPW}(*z*). Our simulation mimics the observed data generating process of a two-stage study design, in which **U** and *Z* are measured at the first stage on all study subjects, but *Y* is measured at the second stage only on a subset of the study participants. The second-stage validation subset is selected with selection probabilities that may depend on the first stage variables. We consider two situations, where the outcome *Y* is either normal or binary, respectively. We generate a random sample of size *n* of (*Z*, *U*, *Y*, *R*) for each replication. *Z* is generated from a uniform(0, 1) distribution, *U* is generated from a uniform(0, 6) independently of *Z*, and the mean of the outcome *Y* has the general form

$$g\{E(Y\mid Z,U)\}=m(Z)+{\beta}_{1}U.$$

(15)

In case one, *g*(*x*) = *x* and the outcome *Y* is generated from a normal distribution with mean *E*(*Y*|*Z*, *U*) and variance *σ*^{2} = 3, where *β*_{1} = 1.3, *m*(*x*) = 2 · *F*_{8,8}(*x*) and *F _{p}*

$$\text{logit}\{\pi ({Z}_{i},{U}_{i})\}={\tau}_{0}+{\tau}_{1}\xb7({U}_{i}-{a}_{1})I({a}_{1}<{U}_{i}\le {a}_{2})+{\tau}_{1}\xb7({a}_{2}-{a}_{1})I({U}_{i}>{a}_{2}),$$

(16)

where *π*(*Z _{i}*,

Our primary interest lies in estimating the marginal mean curve of the outcome *Y* given the scalar covariate *Z*, that is, *μ*{*θ*(*z*)}, which is *E*(*Y*|*Z*) = *E*[*E*(*Y*|*Z*, *U*)|*Z*]. We generated 500 datasets with sample size *n* = 500 or 300. For each simulated dataset, we computed the naive, IPW and AIPW estimates of *θ*(*z*), in the first case under the model *μ _{i}* =

The empirical average of the estimated nonparametric curves (·) over the 500 replications, using the naive, IPW and AIPW estimators are displayed in Figure 1. The plot in the left panel shows the estimators of *θ*(*z*) in case 1 (identity link) and the plot in the right panel shows the estimators in case 2 (logit link). The same trend was observed for both plots. The IPW and AIPW kernel estimates are close to the true curve *θ*(·), while the naive approach yields a biased estimate. Figure 2 illustrates the empirical point-wise variances of _{IPW}(·) and _{AIPW}(·) when *n* = 500, the top panel for the identity link case and the bottom panel for the logit link case. The figure shows that the AIPW estimator has a smaller point-wise variance than the IPW estimator.

Simulation results of the estimated nonparametric functions using naive, IPW, and AIPW kernel methods based on 500 replications with sample size *n* = 500. The left panel is for case 1 (identity link), while the right panel is for case 2 (logit link): —— **...**

Empirical point-wise variances of the IPW and AIPW estimates of *θ*(·), based on 500 replications with sample size *n* = 500. The top panel is for case 1 (identity link), while the bottom panel is for case 2 (logit link): —— **...**

Table 1 summarizes the performance of each nonparametric estimate using the integrated relative bias, the integrated empirical standard error (SE), the integrated estimated SE, and the integrated empirical mean integrated squared error (MISE), over the support of *Z*. As predicted by theory, the naive kernel estimate has a much larger relative bias than the IPW and AIPW kernel estimates. Furthermore, the corresponding AIPW kernel estimate has a smaller variance and a smaller MISE than the IPW kernel estimate. For example in the identity link case, the AIPW kernel estimate has about 52% gain in MISE efficiency compared to the IPW kernel estimate when *n* = 500. In the logit link case, the MISE efficiency gain is about 7%. The increased efficiency gain of AIPW over IPW in case 1 (identity link) compared to case 2 (logit link) can be explained by the fact that in case 1 the auxiliary variable *U* is highly correlated with the outcome *Y* while in case 2, the correlation between *U* and *Y* is much lower.

Simulation results of relative biases, SEs and MISEs of the naive, IPW and AIPW estimates of *θ*(*z*) based on 500 replications (in parenthesis are the Monte Carlo SEs)

To check the double-robustness property of the AIPW estimator, we computed _{AIPW}(·) using (i) estimates of *π _{i}*

Simulation results of the IPW and AIPW estimates of *θ*(·) using an incorrectly specified *π* model and/or an incorrectly specified *δ* = *E*(*Y*|*Z*, **U**) model, based on 500 replications with sample size *n* = 500. The left panel is **...**

We applied the IPW kernel estimating equation and the AIPW kernel estimating equation, as well as the naive kernel estimating equation, to analyze the ACSUS data described in Section 1. In this illustrative example, our main interest is to investigate the effect of the baseline CD4 counts on the risk of hospitalization during the first year since enrollment into the study. Since the risk of hospitalization depends on various covariates, such as HIV status, treatments, race, and gender, but we only consider a marginal nonparametric mean model of the risk of hospital admission on baseline CD4 counts, we restricted our analysis to a subset of homogeneous subjects for illustrative purpose. Specifically, we limited our analysis to 219 white patients, who were between 25 and 45 years old at entry. They were HIV infected or had AIDS and were treated with anti-retroviral drugs but not admitted to hospital at entry. The CD4 counts ranged from 4 to 1716 among this study cohort, with median equal to 186, and interquartile-range (70, 315). Health care records were used to determine hospitalization during the first year after study enrollment. Although lower CD4 counts are expected to be associated with a higher risk of hospitalization, the functional form of this association is unknown and might be nonlinear. As discussed in Section 1, about 40% of the patients did not have the first-year hospital admission data available. If missing outcomes induced selection bias, the patients who have the first-year hospitalization information may not represent the original study cohort and may lead to biased estimation.

Because the distribution of CD4 counts is highly skewed, we took a log transformation and define *Z* = log(baseline CD4 count). The missing data model was fit using a logistic regression with *Z* as well as the other covariates in Table 3, which are binary. The coefficient estimates and their SEs are shown in Table 3. Having insurance and help with transportation enhance the chance of remaining in the study, while use of other medical practitioners, psychological counseling, having help at home and lower CD4 count are significantly associated with a higher chance of dropping out.

Estimates of the logistic regression coefficients of the probability of being observed by the end of the first year in the ACSUS data

We fit the generalized nonparametric model (1) using logit(*μ _{i}*) =

The naive, IPW, and AIPW estimates of *θ*(log CD4 counts) on the log odds of one-year hospitalization in the ACSUS study. The upper left panel displays three estimates: – – – the naive kernel estimate, · · **...**

Since only very few patients had log CD4 count lower than 3, the kernel estimates are not stable when log CD4 count is less than 3. We focus our discuss on the estimates of the curve when log CD4 count is greater than 3. The IPW and AIPW estimates are similar, while the naive one underestimates the risk of hospitalization for most of the range of CD4 in our study cohort. Since patients having help at home are more likely to drop out and these patients are likely to be sicker patients, the patients who have the first-year hospital admission information available are actually a biased sample of the whole study population. Therefore, the naive approach using the complete cases directly leads to a biased estimate of the nonparametric function *θ*(*z*) and underestimates the risk of hospitalization. Our analysis using the IPW and the AIPW kernel estimating equations indicates that the risk of hospitalization decreases nonlinearly as CD4 count increases with a change point. Specifically, when CD4 count is relatively low (CD4 count < 90), the risk of being admitted to hospitals remains fairly stable at about 25%. As the CD4 count exceeds this threshold, the risk of hospitalization decreases quickly as CD4 count goes up.

In this paper we proposed local polynomial kernel estimation methods for nonparametric regression when outcomes are missing at random. We showed that the naive local polynomial kernel estimator is generally inconsistent except for special cases. We proposed IPW and AIPW kernel estimating equations to correct for potential selection bias, with the ultimate goal of maximally exploiting the information in the observed data. Unlike parametric regression, the augmentation term in the AIPW kernel estimating equations incorporates a kernel function. We showed that both the IPW and AIPW kernel estimators are consistent when the selection probabilities are known by design or consistently estimated. When the model for the selection probabilities is misspecified, the IPW kernel estimating equation fails to yield a consistent estimator. However, the AIPW kernel estimator still yields consistent estimators of the regression function if a model for *E*(*Y*|*Z*, **U**) is correctly specified. This double robustness property of the AIPW approach provides the investigators two chances to make a valid inference. The AIPW kernel estimating equation also has the potential to enhance the efficiency with which we estimate the nonparametric regression function. We have shown that within the AIPW estimating equation family, the optimal estimator is obtained by using the true selection probability or its consistent estimates and the augmentation term estimated from a correctly specified model for *E*(*Y*|*Z*, **U**). It is of future research interest to study whether this estimator is optimal in a bigger class of estimators. Another interesting topic of future investigation is the possibility of enhancing the efficiency of the IPW estimator via estimation of the missingness probabilities at nonparametric rates, for example, under generalized additive models rather than under parametric models.

The IPW and AIPW kernel estimating equations provide consistent estimators when the selection probability model *π* is correctly specified and is bounded away from 0. In finite samples, when some *π*’s are close to 0, the IPW and AIPW estimators might not perform well. This is not surprising, as very large weights associated with these very small *π*’s dramatically inflate a few observations especially when the sample size is moderate, and cause results unstable. Special caution is hence needed when applying the proposed methods to studies when the selection probability is very small for some sample units.

We have focused in this paper on nonparametric regression on a single scalar covariate when the outcome is missing at random. The proposed method can be extended to semiparametric regression, where some covariates are modeled parametrically and some covariates are modeled nonparametrically. The proposed methods can also be easily generalized to higher order local polynomial kernel regression and nonparametric regression with multiple covariates, for example, using generalized additive models. Extension of our work to these settings will be reported in a separate paper.

Wang and Lin’s research is partially supported by a grant from the National Cancer Institute (R37-CA–76404). Rotnitzky’s research is partially supported by grants R01-GM48704 and R01-AI051164 from the National Institutes of Health.

Supplementary materials for this article are available online. Please click the JASA link at http://pubs.amstat.org.

**Technical Proofs:** Regularity conditions and proofs for Theorems 1, 2, and 3 in Section 4. (webappendix.pdf)

Lu Wang, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

Andrea Rotnitzky, Department of Economics, Di Tella University, Buenos Aires, 1425, Argentina and Adjunct Professor, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

Xihong Lin, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

- Berk M, Maffeo C, Schur C. AHCPR Publication No 93-0019. Agency for Health Care Policy and Research; Rockville, MD: 1993. Research Design and Analysis Objectives, AIDS Cost and Services Utilization Survey (ACSUS) Reports, No. 1.
- Breslow N, Cain K. Logistic Regression for Two-Stage Case-Control Data. Biometrika. 1988;75:11–20.
- Carroll RJ, Ruppert D, Welsh AH. Local Estimating Equations. Journal of the American Statistical Association. 1998;93:214–227.
- Chen J, Fan J, Li K-H, Zhou H. Local Quasi-Likelihood Estimation With Data Missing at Random. Statistica Sinica. 2006;16:1044–1070.
- Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman & Hall; 1996.
- Harezlak J, Wang M, Christiani D, Lin X. Quantitative Quality-Assessment Techniques to Compare Fractionation and Depletion Methods in SELDI-TOF Mass Spectrometry Experiments. Bioinformatics. 2007;23:2441–2448. [PubMed]
- Hastie T, Tibshirani R. Generalized Additive Models. Boca Raton, FL: Chapman & Hall/CRC; 1990.
- Liang H, Wang S, Robins JM, Carroll RJ. Estimation in Partially Linear Models With Missing Covariates. Journal of the American Statistical Association. 2004;99:357–367.
- Little RJA, Rubin DB. Statistical Analysis With Missing Data. 2. New York: Wiley; 2002.
- McCullagh P, Nelder J. Generalized Linear Models. London: Chapman & Hall; 1989.
- Pepe MS. Inference Using Surrogate Outcome Data and a Validation Sample. Biometrika. 1992;79:355–365.
- Reilly M, Pepe MS. A Mean Score Method for Missing and Auxiliary Covariate Data in Regression Models. Biometrika. 1995;82:299–314.
- Robins JM. Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association; 1999. Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models; pp. 6–10.
- Robins JM, Rotnitzky A. Semiparametric Efficiency in Multivariate Regresion Models With Missing Data. Journal of the American Statistical Association. 1995;90:122–129.
- Robins JM, Rotnitzky A, Zhao LP. Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association. 1994;89:846–866.
- Robins JM, Rotnitzky A, Zhao LP. Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data. Journal of the American Statistical Association. 1995;90:106–121.
- Rotnitzky A, Robins J. Semiparametric Regression Estimation in the Presence of Dependent Censoring. Biometrika. 1995;82(4):805–820.
- Rotnitzky A, Holcroft C, Robins JM. Efficiency Comparisons in Multivariate Multiple Regression With Missing Outcomes. Journal of Multivariate Analysis. 1997;61:102–128.
- Rubin DB. Inference and Missing Data. Biometrika. 1976;63:581–592.
- Ruppert D. Empirical-Bias Bandwidths for Local Polynomial Non-parametric Regression and Density Estimation. Journal of the American Statistical Association. 1997;92:1049–1062.
- Tan Z. A Distributional Approach for Causal Inference Using Propensity Scores. Journal of the American Statistical Association. 2006;101:1619–1637.
- Wand M, Jones M. Kernel Smoothing. London: Chapman & Hall; 1995.
- Wang CY, Wang S, Gutierrez RG, Carroll RJ. Local Linear Regresion for Generalized Linear Models With Missing Data. The Annals of Statistics. 1998;26:1028–1050.
- Zhang D, Lin X, Sowers M. Semiparametric Regression for Periodic Longitudinal Hormone Data From Multiple Menstrual Cycles. Biometrics. 2000;56:31–39. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |