Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3061242

Formats

Article sections

- Summary
- 1. Introduction
- 2. General Coarsened Data Framework and Coarsening at Random
- 3. Inferential Objective and Doubly Robust Estimators
- 4. Existing and Proposed Doubly Robust Estimators
- 5. Application to ACTG 175
- 6. Simulation Studies
- 7. Discussion
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2012 June 1.

Published in final edited form as:

Published online 2010 August 19. doi: 10.1111/j.1541-0420.2010.01476.x

PMCID: PMC3061242

NIHMSID: NIHMS222460

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial.

Studies in which data are to be collected longitudinally according to a pre-determined schedule are often complicated by dropout, where some subjects leave the study prematurely and do not return, so that the intended data from the point of dropout onward are missing. Ordinarily, interest focuses on questions that can be formalized within a statistical model describing aspects of the distribution of the full data, the data that would have been collected on a subject had dropout not occurred. Failure to take dropout into account in analyses based on the observed data, which are curtailed due to dropout for some participants, can lead to biased inferences on full data model parameters, and a vast literature exists on methods for making valid inferences based on the observed data under different assumptions regarding the dropout mechanism; e.g., Hogan, Roy, and Korkontzelou (2004), Philipson, Ho, and Henderson (2008), and Molenberghs and Fitzmaurice (2009) and the references therein.

As a running example, we consider data from AIDS Clinical Trials Group (ACTG) Protocol 175 (Hammer et al., 1996), where subjects infected with human immunodeficiency virus (HIV) were randomized to four antiretroviral regimens: zidovudine (ZDV), ZDV+didanosine (ZDV+ddI), ZDV+zalcitabine (ZDV+ddC), and didanosine (ddI). On each, CD4 T-cell count (cells/mm^{3} blood), a measure of immunologic status, was measured at baseline and, ideally, at 20±5, 40±5, 60±5, and 96±5 weeks post-baseline, along with several baseline covariates. As the latter three regimens showed no differences, we focus on estimating mean CD4 count at 96±5 weeks for the population of subjects assigned to any of the three. Of 1838 such participants, 12%, 30%, 38%, and 49% had dropped out by each of the visit times, respectively. Clearly, the substantial dropout complicates inference on the population mean.

Missingness due to dropout in a longitudinal study is a special case of monotone coarsening. Under coarsening, for each subject, one of a set of *M* + 1 many-to-one functions of the full data, indexed by *r* = 1, …, *M*, ∞, is observed (Heitjan and Rubin, 1991; Gill, van der Laan, and Robins, 1997; Tsiatis, 2006). With monotone coarsening, the many-to-one function for any *r* = 1, …, *M* is itself a many-to-one function of the (*r* + 1)th function, so that *r* = 1 corresponds to the “most coarsened” data and *r* = *M* to the least, and ∞ denotes no coarsening (the full data are observed). Monotone dropout in a longitudinal study fits into this framework, with *r* indexing *M*+1 planned data collection times, where *r* = 1 corresponds to baseline. Here, the coarsened data at level *r* are the data that would be observed on a subject who is present for the *r*th visit and then drops out prior to the (*r* + 1)th visit.

Analogous to the notion of missing at random (MAR), the mechanism leading to coarsening is coarsening at random (CAR, Heitjan and Rubin, 1991) if, for each *r*, the probability that, given the full data, the data are coarsened at level *r* depends only on the coarsened data (so not on data not observed at level *r*). Whether or not the CAR assumption is reasonable must of course be critically evaluated by the analyst; when it is plausible, a number of approaches have been proposed for making inference on full data model parameters based on the observed, coarsened data. These include likelihood methods, where a parametric model for the entire full data distribution may be posited, from which the likelihood based on the coarsened data can be deduced without the need to specify the coarsening mechanism (e.g., Birmingham, Rotnitzky, and Fitzmaurice, 2003; Little, 2009). These methods will yield valid inferences as long as the posited full data model is correct, but can lead to bias otherwise. In contrast, inverse probability weighted methods (IPW) (Robins, Rotnitzky, and Zhao, 1994, 1995; Rotnitzky, Robins, and Scharfstein, 1998; Rotnitzky, 2009) require specification of models for the coarsening probabilities, and the resulting estimators are consistent only if these models are correct and can be unstable in practice if some probabilities of observing the full data are close to zero, leading to large inverse weights. Robins et al. (1994) identified a class of “augmented” IPW (AIPW) estimators that, in the present context, involve (parametric) modeling of both the coarsening probabilities and the conditional expectations of certain functions of the full data given the coarsened data for each level of coarsening. The efficient member of this class, with smallest asymptotic variance, is obtained when both sets of models are correctly specified. Scharfstein, Rotnitzky, and Robins (1999) noted that estimators in this class are consistent even if one of the sets of models (but not both) is misspecified. Such estimators are referred to as “doubly robust” (DR) and have been advocated owing to the protection this feature affords (Bang and Robins, 2005). Bang and Robins (2005) described a DR estimator in the case of a longitudinal study with dropout and provided simulation evidence demonstrating the DR property; see also Seaman and Copas (2009).

Despite their obvious appeal, DR estimators have been vigorously criticized. Kang and Schafer (2007) presented simulations in the simple situation of estimation of a population mean from an iid sample with MAR response showing that the usual DR estimator can exhibit severe bias when both sets of models are only “slightly” misspecified and/or when some probabilities of observing full data are close to zero and argued against use of DR estimators. In this setting, however, Tan (2006, 2007, 2008) and Cao, Tsiatis, and Davidian (2009) showed how to construct DR estimators that do not have these shortcomings; see also Goetgeluk, Vansteelandt, and Goetghebeur (2009). Cao et al. (2009) set out expressly to identify the “best” DR estimator, that with smallest asymptotic variance if the coarsening probabilities are correctly specified regardless of whether or not the conditional expectation models are, and demonstrated that these estimators are relatively more efficient and exhibit superior robustness to slight modeling mishaps relative to other DR estimators.

In this paper, we extend these ideas to the general setting of monotonely coarsened data. In Section 2, we introduce notation and formalize the CAR assumption. We state the inferential objectives and describe the general form of DR estimators in Section 3, and in Section 4 propose an improved DR estimator, which we specialize to the case of a longitudinal study with dropout through application to the ACTG 175 data in Section 5. Simulations presented in Section 6 exhibit the improved performance of the proposed methods.

We follow Tsiatis (2006, Section 7.1). Denote the full data by *Z*; ideally, then, the data intended to be collected are realizations of independent and identically distributed (iid) *Z*_{1}, …, *Z _{n}*. Let

As is customary in general missing data problems, we assume that there is a positive probability of observing the full data; i.e., we make the positivity assumption *P*(*C* = ∞|*Z*) ≥ *ε* > 0 almost everywhere. The CAR assumption may be expressed as

(1)

i.e., the probability of coarsening at level *r* depends on the full data *Z* only as a function *π*{*r*, *G _{r}*(

We now demonstrate how data from a longitudinal study with dropout fit into this framework, where we use notation popularized by Robins and colleagues (e.g., Bang and Robins, 2005). Let *L _{j}* be the vector of information collected at visit time

We suppose that the analyst has specified a semiparametric model for the full data *Z* corresponding to density *p _{Z}*(

For ACTG 175, with *Y* = *Y*_{5} = CD4 count at 96±5 weeks, *β* = *E*(*Y*). With no further assumptions, *p _{Z}*(

In general, we assume that estimators for *β* exist based on the full data, defined by (*p* × 1) unbiased estimating functions *m*(*Z*, *β*); i.e., such that *E* {*m*(*Z*, *β*)} = 0 for all *β* (or at least for *β* in a neighborhood of *β*_{0}, the true value). An estimator would solve
and, under regularity conditions, would be consistent and asymptotically normal by standard M-estimator theory (Stefanski and Boos, 2002). For the sample mean at 96±5 weeks in ACTG 175, *m*(*Z*, *β*) = *Y* − *β*; for the slope parameter, *m*(*Z*, *β*) would be a GEE estimating function, perhaps involving nuisance parameters in a “working” correlation structure.

We start with the premise that the analyst has fully specified the coarsening probabilities (1), so that they involve no unknown parameters, a requirement we relax in Section 4. For general monotonely coarsened data, the theory of Robins et al. (1994) implies that, under CAR, if the coarsening probabilities *π* {*r*, *G _{r}*(

(2)

(Tsiatis, 2006, Chapter 10) where *π* {*r*, *G _{r}*(

When the *π* {*r*, *G _{r}*(

(3)

where is some estimator for *ξ*. The method of estimating *ξ* is key; see Section 4.

Writing *m*(*Z*) = *m*(*Z*, *β*_{0}), if the coarsening probabilities are correctly specified and converges in probability to some *ξ*^{*}, say, an estimator for *β* solving (3) will be consistent and asymptotically normal regardless of whether or not *h _{r}* {

In the context of a longitudinal study with dropout, Bang and Robins (2005) described estimators for *β* that are solutions to (3); we present details in the next section.

We continue to assume that the coarsening probabilities *π*{*r*, *G _{r}*(

Denote the true coarsening probabilities as *π*_{0}{*r*, *G _{r}*(

(4)

where *ξ*^{*} is the value to which the estimator used converges in probability. Denote this minimizing value by *ξ ^{opt}*. If the models

From (4), *ξ ^{opt}* must satisfy

where *h _{r ξ}*{

(5)

We now derive an estimator * _{opt}* for

(6)

where *q _{r}* {

We propose the estimator * _{opt}* found by taking

(7)

With (7) substituted, it may be shown (see Web Appendix B) that (6) has expectation zero at *ξ* = *ξ ^{opt}*, where

Summarizing, the proposed estimator * _{opt}*, found by using (7) in the estimating function (6) for

For either estimator, in order to be feasible models for *E*{*m*(*Z*)|*G _{r}*(

The coarsening probabilities are unlikely to be known except in a study where coarsening is by design. Thus, it is natural to postulate and fit parametric models for the coarsening mechanism (Tsiatis, 2006, Section 8.2); e.g., model the discrete hazards *λ _{r}*{

As detailed in Tsiatis (2006, Chapters 8–10), there is an effect on the asymptotic distribution of an estimator for *β* solving (3) when the coarsening probabilities are modeled and *ψ* is estimated by the maximum likelihood estimator (MLE) . In particular, it follows from Theorem 9.1 of Tsiatis (2006) that, when the models for the coarsening probabilities are correctly specified, so that there exists *ψ*_{0} such that *λ _{r}*{

(8)

for some is asymptotically equivalent to that solving

(9)

Here, *ξ*^{*} is the limit in probability of , and *θ _{proj}* is the value of

(10)

when *ξ*^{*} is substituted for *ξ*, and

(11)

Referring to (4), which defines *ξ ^{opt}* assuming

As, in practice, *ψ*_{0} is unknown, write * _{r}*{

(13)

and * _{jθ}*{

Noting that converges in probability to *ψ*_{0} when the coarsening probabilities are modeled correctly, if they are correct but the *h _{r}*{

Because
is an M-estimator, and similarly for _{br}_{*}, the asymptotic covariance matrix for each may be approximated by the empirical sandwich method (Stefanski and Boos, 2002) and will be consistent for the true sampling covariance matrices regardless of whether or not one or both sets of models is misspecified; see Web Appendix D.

An alternative approach to estimation of *ξ* would be to extend the methods of Tan (2006, 2007, 2008) to the setting of monotonely coarsened data. Given that the proposed approach is optimal when the discrete hazard models are correct, theoretically, such an extension would be no more efficient in this case. In simulations in Cao et al. (2009), the proposed approach outperformed that of Tan for estimation of a single population mean under both correct and incorrect models, and we would expect to see similar relative performance here.

We now demonstrate how the foregoing development is specialized to a longitudinal study with dropout by application to ACTG 175. Recall that interest focuses on *β* = *E*(*Y* ), where *Y* = *Y*_{5} = CD4 count at 96±5 weeks, the mean CD4 count for the HIV-infected population if assigned to regimens ZDV+ddI, ZDV+ddC, or ddI; *M* = 4; and *m*(*Z*, *β*) = *Y* − *β*. The baseline covariate vector *X* includes age (years); weight (kg); Karnofsky score (karnof), an index reflecting ability to perform activities of daily living (0 to 100); days of prior antiretroviral therapy (antidays); and binary indicator variables for hemophilia (hemo), homosexual activity (homo), history of intravenous drug use (drug), ZDV within 30 days of the trial, race (0 = white), gender (0 = female), antiretroviral history (hist; 0 = naive, 1 = experienced), and symptomatic status (symp; 0 = asymptomatic).

We consider estimation of *β* by the simple IPW estimator, which corresponds to solving (8) with all of the *h _{r}*{

Using the notation at the end of Section 2, we represent the models we now present by replacing *C* by *R* and *G _{r}*(

Noting that *E*{*m*(*Z*)|* _{j}*} =

(14)

where *α _{i}* = (

Estimation of *ξ* by solution of the estimating equations based on (12) may be carried out via standard techniques, such as a Newton-Raphson updating scheme. Thus, in principle, implementation is no more complex than for the Bang-Robins approach, where the *ξ _{r}* are estimated by separate solutions to

For comparison, we also fit the mixed model (14) directly by normal ML using SAS proc mixed (SAS Institute, 2009) and estimated *β* by the marginal predicted value * _{mixed}* at 96±5 weeks obtained by setting equal to its sample mean with SE from the associated estimate statement, which treats the sample mean of as fixed.

The resulting * _{ipw}* = 332.96, (SE 5.10),
. Recognizing that this is a single data set, it is encouraging to note that the estimates are virtually identical, and, consistent with the theory, the IPW estimator is inefficient relative to the AIPW competitors on the basis of estimated SE. Moreover, both versions of the proposed estimator achieve or surpass the performance of the Bang and Robins estimators, although not dramatically, and all estimates are indeed smaller than the naive estimate, as expected. We also obtained

We deliberately chose the ACTG 175 study to demonstrate the methods because of a unique feature that highlights the advantage of consideration of the general setting of monotone coarsening. Although subjects in the study ceased to attend clinic visits and provide CD4 counts after some time point, so effectively did “drop out” of the study with respect to the response of interest, follow-up of all subjects continued. Thus, additional information on each subject throughout the entire 96-week period, regardless of whether or not s/he ceased to attend clinic visits, is available, which we summarize in four time-dependent covariates dis* _{ij}* =

Reverting to the general notation, *Z* = (*X*, *Y*_{1}, *Y*_{2}, *Y*_{3}, *Y*_{4}, *Y*,dis_{1},dis_{2},dis_{3},dis_{4}); and, with *C* = *r* indicating that the subject last provided a CD4 count at visit *r*, we observe *G _{r}*(

Recall that the goal is to estimate *β* = mean CD4 count at 96±5 weeks for the population *assigned* to ZDV+ddI, ZDV+ddC, or ddI, so regardless of whether or not subjects stayed on these regimens for the entire 96 weeks. We illustrate by calculating _{opt}_{*} and _{br}_{*} as follows. For both estimators, we derived the discrete hazard models by the same strategy as in the previous analysis, considering all elements of *G _{r}*(

We carried out several simulations to assess the performance of the proposed methods in the case of a longitudinal study with dropout, which we describe using the notation at the end of Section 2. To obtain data for subject *i*, *i* = 1, …, *n*, we generated baseline covariates (*t*_{1} = 0) *X _{i}* = (

For each inverse weight scenario, we considered the four situations of all combinations of correct or incorrect regression models *h _{j}*(

Table 1 summarizes the distributions of estimated inverse weights { *π* (∞, *Z*, )}^{−1} and [*K _{r}*{

Summaries of distributions of estimates of inverse weights for simulated subjects across 1000 Monte Carlo data sets. Min and Max are the minimum and maximum values across all subjects for whom the indicated inverse weights were calculated across all 1000 **...**

Simulation results for the “moderately large” inverse weight scenario; 1000 Monte Carlo replications. Bias is Monte Carlo bias, RMSE is root mean square error, MCSD is Monte Carlo standard deviation, AveSE is average of sandwich standard **...**

These results suggest that the proposed approach may be more stable than competing methods in situations with extreme estimated inverse weights. The method seeks to obtain as efficient an estimator of *β* as possible under correct discrete hazards models, where the estimator for *ξ* serves only to increase efficiency and is not of inherent interest. Our estimator for *ξ* minimizes the expected squared residual in (4) or (10), where one may think of the residual as subtracting the augmentation term (involving *ξ*) from the IPW complete case term *I*(*C* = ∞)*m*(*Z*)/*π* (∞, *Z*). We conjecture that the resulting choice of *ξ* acts automatically to counteract, to the extent possible, the destabilizing effects that large inverse weights [*π* (∞, *Z*)]^{−1} in particular have on the variance of the estimator for *β*.

We have proposed doubly robust estimators for general semiparametric full data model parameters based on data subject to monotone coarsening at random. A special case is that of longitudinal data subject to MAR dropout. As for a population mean under MAR response as in Cao et al. (2009), the methods are designed to equal or exceed the asymptotic efficiency relative to other doubly robust estimators when models for the coarsening mechanism are correctly specified, even when regression models incorporated to increase efficiency are not. In contrast to simulations by Kang and Schafer (2007), our empirical studies show that doubly robust estimators need not exhibit disastrous performance, even when both sets of models are incorrectly specified, and that the proposed estimator may outperform competing methods and be more stable in the presence of very large inverse weights.

As noted in Section 4, the regression models must be consistent with one another for each level of coarsening, which may be ensured through specification of a part of the joint distribution of the full data. The impact of such specification and more generally the role of model selection for both the regressions and coarsening mechanism merits formal study.

Work supported by NIH grants R37 AI031789, R01 CA051962, R01 CA085848, and P01 CA142538.

Web Appendices A-G referenced in Sections 4, 5, and 6 are available under the Paper Information link at the *Biometrics* website http://www.biometrics.tibs.org.

- Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–972. [PubMed]
- Birmingham J, Rotnitzky A, Fitzmaurice GM. Pattern-mixture and selection models for analysing longitudinal data with monotone missing patterns. Journal of the Royal Statistical Society, Series B. 2003;65:275–297.
- Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:723–734. [PMC free article] [PubMed]
- Gill RD, van der Laan MJ, Robins JM. Coarsening at random: characterizations, conjectures and counter examples. Proceedings of The First Seattle Symposium in Biostatistics: Survival Analysis; New York: Springer; 1997. pp. 255–294.
- Goetgeluk S, Vansteelandt S, Goetghebeur E. Estimation of controlled direct effects. Journal of the Royal Statistical Society, Series B. 2009;70:1049–1066.
- Hammer SM, Katzenstein DA, Hughes MD, Gundaker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, Hirsch MS, Merigan TC. for the AIDS Clinical Trials Group Study 175 Study Team. A trial comparing nucleoside monotherapy with combination therapy in HIV infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine. 1996;335:1081–1089. [PubMed]
- Heitjan DF, Rubin DB. Ignorability and coarse data. The Annals of Statistics. 1991;19:2244–2253.
- Hogan JW, Roy J, Korkontzelou C. Handling drop-out in longitudinal studies. Statistics in Medicine. 2004;23:1455–1497. [PubMed]
- Kang DY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data (with discussion and rejoinder) Statistical Science. 2007;22:523–380. [PMC free article] [PubMed]
- Little RJA. Selection and pattern-mixture models. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC; 2009. pp. 409–431.
- Molenberghs G, Fitzmaurice G. Incomplete data: Introduction and overview. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC; 2009. pp. 395–408.
- Philipson PM, Ho WK, Henderson R. Comparative review of methods for handling drop-out in longitudinal studies. Statistics in Medicine. 2008;27:6276–6298. [PubMed]
- Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866.
- Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121.
- Rotnitzky A. Inverse probability weighted methods. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Longitudinal Data Analysis. Boca Raton, FL: Chapman and Hall/CRC; 2009. pp. 453–476.
- Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association. 1998;93:1321–1339.
- SAS Institute. SAS/STAT 9.2 User’s Guide. Cary NC: SAS Institute Inc; 2009.
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable dropout using semiparametric nonresponse models. (with discussion and rejoinder) Journal of the American Statistical Association. 1999;94:1096–1146.
- Seaman S, Copas A. Doubly robust generalized estimating equations for longitudinal data. Statistics in Medicine. 2009;28:937–955. [PubMed]
- Stefanski LA, Boos DD. The calculus of M-estimation. The American Statistician. 2002;56:29–38.
- Tan Z. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association. 2006;101:1619–1637.
- Tan Z. Understanding OR, PS and DR. Statistical Science. 2007;22:560–568.
- Tan Z. Comment: Improved Local Efficiency and Double Robustness. The International Journal of Biostatistics. 2008;4(1) doi: 10.2202/1557–4679.1109. Article 10. [PubMed] [Cross Ref]
- Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |