Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2996053

Formats

Article sections

- SUMMARY
- 1. INTRODUCTION
- 2. UNDERLYING DATA MODEL
- 3. ESTIMATORS
- 4. APPLICATION: Response of CD4 Lymphocytes to Treatment with AZT or ddI
- 5. SIMULATION STUDY
- 6. DISCUSSION
- References

Authors

Related links

Stat Med. Author manuscript; available in PMC 2010 December 2.

Published in final edited form as:

PMCID: PMC2996053

NIHMSID: NIHMS254319

Andrea B. Troxel,^{1,}^{*} Stuart R. Lipsitz,^{2} Garrett M. Fitzmaurice,^{2} Joseph G. Ibrahim,^{3} Debajyoti Sinha,^{4} and Geert Molenberghs^{5}

The publisher's final edited version of this article is available at Stat Med

For longitudinal binary data with non-monotone non-ignorably missing outcomes over time, a full likelihood approach is complicated algebraically, and with many follow-up times, maximum likelihood estimation can be computationally prohibitive. As alternatives, two pseudo-likelihood approaches have been proposed that use minimal parametric assumptions. One formulation requires specification of the marginal distributions of the outcome and missing data mechanism at each time point, but uses an “independence working assumption,” i.e., an assumption that observations are independent over time. Another method avoids having to estimate the missing data mechanism by formulating a “protective estimator.” In simulations, these two estimators can be very inefficient, both for estimating time trends in the first case and for estimating both time-varying and time-stationary effects in the second. In this paper, we propose use of the optimal weighted combination of these two estimators, and in simulations we show that the optimal weighted combination can be much more efficient than either estimator alone. Finally, the proposed method is used to analyze data from two longitudinal clinical trials of HIV-infected patients.

Longitudinal studies in which each subject is to be observed at a fixed number of time points have become very popular in social science and medical applications. For example, longitudinal data are often collected in AIDS, cardiovascular, and cancer clinical trials and observational studies. We focus on the case where the response variable over time is binary (e.g., success edu or failure) and are interested in modeling the marginal means or success probabilities; this setting has been well-described [1, 2, 3, 4, 5]. Such modeling is often complicated by the fact that in longitudinal studies, the outcome is not always observed at all assessment times. In addition, this missing data is often non-ignorable [6], since the probability that an outcome is missing at a given time can depend on the potentially missing value of the outcome at that time. The missing outcome data must be properly accounted for in the analysis, and numerous approaches have been proposed [7, 8, 9, 10, 11]. In clinical trials, an individual’s response is often missing at one follow-up time but observed at the next follow-up time, resulting in a large class of distinct missingness patterns, often called “non-monotone” missingness. We will, however, assume that all subjects have the outcome measured at the first time point; e.g., to be part of the study, the subject must be seen at baseline.

An example of a data set with this structure comes from two longitudinal clinical trials of HIV-infected patients sponsored by the AIDS Clinical Trials Group (ACTG): ACTG 116A [12] and 116B/117 [13]. The two studies were randomized phase III double-blind trials, designed to compare two treatments, zidovudine (AZT) and didanosine (ddI); they differed with respect to the length of prior treatment with AZT and have been used in several combined analyses [14]. The response of interest is normal CD4 cell count (> 200 cells per cubic millimeter) versus abnormal CD4 cell count (≤ 200) measured at baseline (week 0) and every week for up to 5 weeks from baseline. The cutoff of 200 was initially chosen because of its strong predictive value for development of opportunistic infections and has been adopted as a standard threshold of clinical importance [15]. Previously, we analyzed these data for HIV patients with and without AIDS [16]; here we consider only the 431 patients with AIDS at baseline. The main question of scientific interest is the effect of treatment on changes in CD4 cell count sufficiency over time. As with most longitudinal studies, missing outcome data over time complicate the analysis. For example, fewer than 50% of the patients (202/431= 46.9%) have outcomes measured at all 6 occasions.

Table I shows the number of subjects seen at each of the six possible occasions. In Table I, we see that 383 of the 431 patients (88.9%) had a measurement at week 1; the percentage of patients seen slowly drops until 285 (66.1%) of the 431 patients are seen at week 5. A majority of the missing data is due to patients who drop out, i.e., once the patient misses a scheduled visit, no further measurements of the response variable are obtained. However, there are 109 (25.3%) patients who missed at least one measurement, but returned for a later measurement. In this setting, it is quite plausible that patients with high or normal CD4 counts are more likely to miss the scheduled study visits. If this is true, then missingness depends on the unobserved outcome of interest and is nonignorable. Indeed, some have argued that the only plausible non-monotone missing at random mechanisms are those that derive from randomized monotone missingness processes [17, 18]. In the longitudinal data setting, these processes require that missingness at an assessment depends on the prior assessment if and only if the prior assessment is observed; such processes are thus highly implausible here.

To formulate a full likelihood for non-ignorable non-monotone missing outcomes over time, one must specify a joint distribution for the *T* repeated binary outcomes of interest, of dimension 2* ^{T}*, and a model for the missingness mechanism. To estimate the parameters, a full likelihood approach has many nuisance parameters and is complicated algebraically; furthermore, estimation can be computationally prohibitive, especially when the number of times is large. As alternatives to a full likelihood procedure, two pseudo-likelihood [19, 20] procedures have been proposed by Troxel et al. [21] and Fitzmaurice et al. [22] under minimal parametric assumptions.

First, Troxel et al. [21] proposed a pseudo-likelihood that is formed by an “independence working assumption,” i.e., assuming for the purpose of estimation that the longitudinal binary measurements are independent over time and ultimately applying a robust “sandwich” variance estimate [23] to achieve proper inference. Specifically, their pseudo-likelihood first assumes a marginal logistic regression model for the outcome at each time point; it also assumes that the missingness probability at a given time depends only on the possibly missing response at that time and the covariates (the covariates are assumed to be fully observed). The chief attraction of this pseudo-likelihood approach is that it substantially eases the numerical complexities of the full likelihood approach by reducing high-dimensional sums to sums of a single dimension. Further, it alleviates the need to specify and estimate many nuisance parameters that are needed in a full likelihood approach. In addition, asymptotically unbiased estimators of the regression parameters and missingness parameters can be obtained. However, by assuming independence of repeated measures across measurement occasions, the method can be highly inefficient for estimating the regression parameters. For example, results from Table 1 of Troxel et al. [21] indicate that their pseudo-likelihood method can be very inefficient compared to the MLE, and in particular in estimating time trends.

Alternatively, Fitzmaurice et al. [22] proposed a pseudo-likelihood based on the idea of formulating a “protective estimator” [24] without having to estimate the missing data mechanism. Specifically, they assume that the baseline response is fully observed and that the probability that a response is missing at any future occasion is conditionally independent of the baseline response given the response at that occasion. This assumption ensures that the conditional distributions of the outcome at time 1 given the outcome at any future time are fully identifiable. These conditional distributions are functions only of the parameters of primary scientific interest (the regression parameters), and not the parameters of the missing data mechanism. Their pseudo-likelihood is based on the conditional distributions of the baseline response, given the response at each future occasion, for estimation of the regression parameters. The pseudo-likelihood requires only specification of the bivariate distribution of the outcome at time 1 and any future time, and is thus computationally much more feasible than maximum likelihood. The resulting parameter estimates are asymptotically unbiased when the identifying assumption holds. However, the results of their simulation study showed that, with high correlation, the “protective estimator” can be highly efficient for estimating time trends, but inefficient for estimating the effects of time-stationary covariates.

Since the estimate from Troxel et al.’s [21] approach is very inefficient for estimating time trends, and the estimate from Fitzmaurice et al.’s [22] approach is very inefficient for estimating time-stationary effects, this suggests formulating a new estimator of the marginal regression parameters that is a combination of these two estimators. In this paper, we propose forming a new estimator that is the asymptotic minimum variance linear combination of the two estimators [25, 26]. The new estimator is basically a weighted least squares estimate, where the weight matrix is the inverse of the estimated asymptotic covariance matrix of the vector formed from concatenating the Troxel et al. and Fitzmaurice et al. estimates. This estimated asymptotic covariance matrix is obtained using the “sandwich” variance estimator of White [23].

The remainder of the paper is organized as follows. In Section 2, we describe the underlying data models and introduce the necessary notation. In Section 3, we review the pseudo-likelihoods of Troxel et al. and Fitzmaurice et al., and our proposed weighted combination of the two. Section 4 illustrates the methods with the AIDS example. In Section 5, we present results from our simulation study, showing that our proposed estimator produces much more efficient estimates of the time trends than the Troxel et al. estimator, and much more efficient estimates of the time-stationary effects than the Fitzmaurice et al. estimator.

We assume that *n* independent subjects are to be observed at a fixed set of *T* occasions, *t* = 1, …, *T*. For the *i ^{th}* individual (

$${p}_{it}={p}_{it}(\beta )=E({Y}_{it}{\mathbf{x}}_{it},\beta )=pr({Y}_{it}=1{\mathbf{x}}_{it},\beta )=\frac{exp({\mathbf{x}}_{it}^{\prime}\beta )}{1+exp({\mathbf{x}}_{it}^{\prime}\beta )}.$$

(1)

In a marginal model, the goal is to make inferences about the marginal regression parameters *β*, whereas the within-subject association among the repeated responses is regarded as a nuisance characteristic of the data. Although the association model is not even specified in the pseudo-likelihood of Troxel et al., the pairwise associations between the outcome at time 1 and the follow-up times must be correctly specified in the protective pseudo-likelihood of Fitzmaurice et al. to obtain consistent estimates. Thus, we briefly discuss the association model here.

The association between a pair of binary outcomes is typically measured in terms of marginal odds ratios [28] or marginal correlations [29]. For ease of exposition, as well as to be compatible with the original protective pseudo-likelihood of Fitzmaurice et al., here we discuss marginal correlations. In general, we propose a generalized autoregressive(1)-type correlation structure. For two different points in time *s* ≠ *t*, the generalized autoregressive(1) model states that

$${\rho}_{\mathit{ist}}=\mathit{Corr}({Y}_{is},{Y}_{it}{\mathbf{x}}_{i})={\rho}^{{t-s\theta}^{},}$$

where −1 *< ρ <* 1 and −∞ < *θ* < ∞. Note that if *θ* = 0, this correlation reduces to an exhangeable correlation, *ρ _{ist}* =

In many longitudinal studies, individuals are not observed at all *T* occasions on account of some stochastic missing data mechanism. Here, we assume that all subjects are observed at baseline (*t* = 1). However, subjects can be missing at any follow-up time. It is convenient then to introduce (*T* − 1) random variables, *R _{it}*, (

$${\pi}_{it}={\pi}_{it}({Y}_{it},{\mathbf{x}}_{it},\gamma )=pr({R}_{it}=1{y}_{it},{\mathbf{x}}_{it},\gamma )=\frac{exp({\gamma}_{0}+{\gamma}_{1}{y}_{it}+{\gamma}_{2}^{\prime}{\mathbf{x}}_{it})}{1+exp({\gamma}_{0}+{\gamma}_{1}{y}_{it}+{\gamma}_{2}^{\prime}{\mathbf{x}}_{it})}.$$

(2)

In this marginal model, if *γ*_{1} ≠ 0, then the missing data mechanism is non-ignorable, since the probability of being missing depends on possibly unobserved data *Y _{it}*. In the next section, we briefly discuss the pseudo-likelihoods of Troxel et al. and Fitzmaurice et al., and we describe our proposed estimator.

In this section we review the pseudo-likelihood approach proposed by Troxel et al. [21] that uses an “independence working assumption,” i.e., assumes that observations are independent over time. The resulting pseudo-likelihood is a product of simple marginal terms and can be used to estimate the marginal regression parameters *β* and the marginal missingness parameters *γ*, but not the association parameters *α*. To describe this pseudo-likelihood, we let *f*(*y _{it}*,

$$f({y}_{it},{r}_{it}{\mathbf{x}}_{it},\beta ,\gamma )=f({y}_{it}{\mathbf{x}}_{it},\beta )f({r}_{it}{y}_{it},{\mathbf{x}}_{it},\gamma ),$$

where *f*(*y _{it}*|

$$f({y}_{it},{r}_{it}{\mathbf{x}}_{it},\beta ,\gamma )$$

if *Y _{it}* were observed, and would be

$$\sum _{{y}_{it}=0}^{1}f({y}_{it},{r}_{it}{\mathbf{x}}_{it},\beta ,\gamma )$$

if *Y _{it}* were missing.

The pseudo-likelihood [21], then, which treats the observations at different times as independent, is

$$\begin{array}{l}{\mathcal{L}}_{\mathit{ind}}(\beta ,\gamma )=\underset{\underset{{[f({y}_{it},{r}_{it}{\mathbf{x}}_{it},\beta ,\gamma )]{r}_{it}}^{{[\sum _{{y}_{it}=0}^{1}f({y}_{it},{r}_{it}{\mathbf{x}}_{it},\beta ,\gamma )](1-{r}_{it})}^{}}}{\overset{}{t=1T}}=\underset{\underset{{[f({y}_{it}{\mathbf{x}}_{it},\beta )f({r}_{it}{y}_{it},{\mathbf{x}}_{it},\gamma )]{r}_{it}{[\sum _{{y}_{it}=0}^{1}f({y}_{it}{\mathbf{x}}_{it},\beta )f({r}_{it}{y}_{it},{\mathbf{x}}_{it},\gamma )](1-{r}_{it})}^{}}^{}=\underset{\underset{{[f({y}_{it}{\mathbf{x}}_{it},\beta ){\pi}_{it}]{r}_{it}}^{{[\sum _{{y}_{it}=0}^{1}f({y}_{it}{\mathbf{x}}_{it},\beta )(1-{\pi}_{it})](1-{r}_{it})}^{}}.}{\overset{}{t=1T}}}{\overset{}{i=1N}}}{\overset{}{t=1T}}}{\overset{}{i=1N}}}{\overset{}{i=1N}}\end{array}$$

This pseudo-likelihood is simply a product of terms at each measurement occasion: when an observation is present, the Bernoulli probability function *f*(*y _{it}*|

The maximum pseudo-likelihood estimate of Troxel et al. [21] under independence maximizes the log pseudo-likelihood, which can be obtained by setting the first derivative of the log pseudo-likelihood, i.e., the pseudo-score vector,

$${S}_{\mathit{ind}}(\beta ,\gamma )=\frac{(\beta ,\gamma )log{\mathcal{L}}_{\mathit{ind}}(\beta ,\gamma )=\sum _{i=1}^{n}{S}_{i,\mathit{ind}}(\beta ,\gamma ),}{}$$

equal to **0** and solving for (* _{ind}*,

The pseudo-likelihood of Fitzmaurice et al. [22] under the protective assumption is a product of (*T* − 1) simple conditional distributions and the marginal distribution of the outcomes at time 1. Recall that we assume *Y _{i}*

$$\underset{f({y}_{i1}{\mathbf{x}}_{i1},\beta )=\underset{{p}_{i1}^{{y}_{i1}}{(1-{p}_{i1})}^{(1-{y}_{i1})}}{\overset{.}{i=1n}}}{\overset{}{i=1n}}$$

(3)

Note that since no data are missing at time 1, one could obtain a consistent, albeit inefficient, estimate of *β* (excluding time effects or interactions with time) from (3).

Next, consider the conditional probability of the outcome at time 1 given the outcome at time *t* (*t >* 1) (and that *Y _{it}* is observed),

$$f({y}_{i1}{y}_{it},{\mathbf{x}}_{it},{R}_{it}=1,\beta ,\alpha ,\gamma )=\frac{\text{pr}({R}_{it}=1{y}_{i1},{y}_{it},{x}_{it},\gamma )f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha ){\sum}_{{y}_{i1}}\text{pr}({R}_{it}=1{y}_{i1},{y}_{it},{x}_{it},\gamma )f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha ).}{}$$

Now suppose the conditional probability pr(*R _{it}* = 1|

$$\text{pr}({R}_{it}=1{y}_{i1},{y}_{it},{\mathbf{x}}_{it},\gamma )=\text{pr}({R}_{it}=1{y}_{it},{\mathbf{x}}_{it},\gamma ).$$

(4)

This implies that the probability of being missing at a time-point can be predicted by all (or some combination) of the data at that time. Under (4),

$$\begin{array}{l}f({y}_{i1}{y}_{it},{\mathbf{x}}_{it},{r}_{it}=1,\beta ,\alpha ,\gamma )=\frac{\text{pr}({r}_{it}=1{y}_{it},{x}_{it},\gamma )f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha )\text{pr}({r}_{it}=1{y}_{it},{x}_{it},\gamma ){\sum}_{{y}_{i1}}f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha )=\frac{f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha ){\sum}_{{y}_{i1}}f({y}_{i1},{y}_{it}{x}_{it},\beta ,\alpha )}{}=f({y}_{i1}{y}_{it},{x}_{it},\beta ,\alpha ).}{}\end{array}$$

(5)

This implies that the conditional distribution of *Y _{i}*

$${\mathcal{L}}_{\mathit{prot}}(\beta ,\alpha \mathbf{Y})=\underset{(f({y}_{i1}{\mathbf{x}}_{it},\beta ,\alpha )\underset{{[f({y}_{i1}{y}_{it},{\mathbf{x}}_{it},\beta ,\alpha )]{r}_{it}}^{}}{\overset{)}{t=2T}},}{\overset{}{i=1n}}$$

which includes all subjects at time 1 and *f*(*y _{i}*

The maximum pseudo-likelihood can again be obtained by setting the pseudo-score vector,

$${S}_{\mathit{prot}}(\beta ,\alpha )=\frac{(\beta ,\alpha )log{\mathcal{L}}_{\mathit{prot}}(\beta ,\alpha )=\sum _{i=1}^{n}{S}_{i,\mathit{prot}}(\beta ,\alpha ),}{}$$

equal to **0** and solving for (* _{prot}*,

The two approaches described above require different but in each case non-trivial assumptions related to the missing data mechanism. The independence approach of Troxel et al. requires *correct* specification of a missingness model in which the missingness probabilities at a given time may depend only on outcomes at that time. The protective approach of Fitzmaurice et al. requires that the missingness probabilities at a given time may depend on outcomes at that time but *must not* depend on outcomes observed at baseline, an assumption that obviates the need to specify the missing data model directly. While the protective assumption in (4) is not the most general non-ignorable missing data mechanism, it is still non-ignorable due to dependence of *R _{it}* on the outcomes at time

There are numerous scenarios in which both models would appear to be reasonable, for example, repeated assessments of highly correlated indicators of symptom occurrence such as tingling of the hands and feet in cancer patients receiving chemotherapy. One can plausibly hypothesize that the *current* occurrence of the symptom almost entirely determines the patient’s ability to attend the clinic and thus have the symptom measured; one can equally plausibly be confident of modeling correctly (or nearly correctly) the predictors of missingness, including the symptom value itself but also numerous other known complicating factors such as patient age, presence of family members to assist, treatment with anthracycline-based chemotherapy, etc.

Of greater interest are scenarios in which one set of assumptions holds but not the other. Consider, for example, a setting in which the likelihood of missingness depends on both the current assessment and the baseline value. This is plausible in the setting of quality of life where difficulty coping at baseline is often indicative of later missingness, but difficulty coping at later assessments also increases the likelihood of being missing; in addition, repeated assessments of quality of life tend to be highly variable and poorly correlated. In this setting, the protective assumption is violated, and the low correlation means that the protective estimator will not be robust to the violation; while the missingness model using the independence approach will also be misspecified by not including the baseline values, it will still correctly capture the nonignorability and thus suffer minimal bias. On the other hand, there are many scenarios in which the protective assumption is satisfied, but the missingness model in the independence approach is so badly misspecified that the subsequent estimates will be biased. In the coping example above, we might specify a model in which those with difficulty coping are more likely to be missing. In reality, however, it may be that both those with very poor coping and very high levels of coping may be equally likely to be missing, the former because they can’t manage their disease and the latter because they see no need for follow-up care. Subjects with missing values will be a mixture of these two populations; the monotone model for missingness that links difficulty coping with higher rates of missingness will fail to capture the higher rates of missingness among a subset of those who are coping well, resulting in biased estimates.

As shown in the previous subsections, under the protective assumption given in (4), and assuming the missingness probability given in (2) is correctly modeled, both the estimate * _{ind}* from Troxel et al.’s pseudo-likelihood, and the estimate

First, note that under the protective assumption, the joint asymptotic distribution of (* _{ind}*,

$${\left[\begin{array}{cc}{\mathcal{I}}_{\mathit{ind}}& 0\\ 0& {\mathcal{I}}_{\mathit{prot}}\end{array}\right]}^{-1}\sum _{i=1}^{n}E\left(\left[\begin{array}{c}{S}_{i,\mathit{ind}}(\beta ,\gamma )\\ {S}_{i,\mathit{prot}}(\beta ,\alpha )\end{array}\right]\phantom{\rule{0.16667em}{0ex}}{\left[\begin{array}{c}{S}_{i,\mathit{ind}}(\beta ,\gamma )\\ {S}_{i,\mathit{prot}}(\beta ,\alpha )\end{array}\right]}^{\prime}\right){\left[\begin{array}{cc}{\mathcal{I}}_{\mathit{ind}}& 0\\ 0& {\mathcal{I}}_{\mathit{prot}}\end{array}\right]}^{-1},$$

where

$${\mathcal{I}}_{\mathit{ind}}=E[\frac{(\beta ,\gamma ){S}_{\mathit{ind}}(\beta ,\gamma )}{]},$$

and

$${\mathcal{I}}_{\mathit{prot}}=E[\frac{(\beta ,\alpha ){S}_{\mathit{prot}}(\beta ,\alpha )}{]}.$$

The variance estimate is obtained by replacing (*β*, *γ*) in *S _{i,ind}*(

Under the protective assumption,

$$E\left[\begin{array}{c}{\widehat{\beta}}_{\mathit{ind}}\\ {\widehat{\beta}}_{\mathit{prot}}\end{array}\right]=\left[\begin{array}{c}{\mathbf{I}}_{J}\\ {\mathbf{I}}_{J}\end{array}\right]\beta =\mathbf{Z}\beta ,$$

where **I*** _{J}* is a (

We propose forming a new estimator that is the asymptotic minimum variance linear combination of * _{ind}* and

$${\widehat{\beta}}_{\mathit{wls}}={[{\mathbf{Z}}^{\prime}{\widehat{V}}_{\beta}^{-1}\mathbf{Z}]}^{-1}[{\mathbf{Z}}^{\prime}{\widehat{V}}_{\beta}^{-1}{({\widehat{\beta}}_{\mathit{ind}}^{\prime},{\widehat{\beta}}_{\mathit{prod}}^{\prime})}^{\prime}],$$

which has asymptotic variance estimated by

$$\widehat{\mathit{Var}}({\widehat{\beta}}_{\mathit{wls}})={[{\mathbf{Z}}^{\prime}{\widehat{V}}_{\beta}^{-1}\mathbf{Z}]}^{-1}.$$

The estimate * _{wls}* has the minimum asymptotic variance of any linear combination of

We present an analysis of the CD4 count data from the AIDS clinical trials described in the Introduction. The parameters are estimated using the protective pseudo-likelihood, the non-ignorable pseudo-likelihood under independence, WLS, and generalized estimating equations (GEE) [2] under ignorable assumptions, described below. The two AIDS clinical trials are randomised phase III double-blind trials, designed to compare two therapeutic treatments: zidovudine (AZT) and didanosine (ddI); the dataset contains records on *n* = 431 patients diagnosed with AIDS or AIDS-related complex. The response of interest at time (week) *t* = 0, 1, …, 5 is the patient’s CD4 count sufficiency, with *Y _{it}* = 1 if the CD4 count exceeds 200 and 0 otherwise. As discussed in the Introduction and given in Table I, CD4 count data are missing for 11% to 44% of patients at the five follow-up occasions; moreover, the missing data patterns are non-monotone.

To describe the treatment effect, we form the following indicator variable

$${\text{AZT}}_{i}=\{\begin{array}{l}1\phantom{\rule{0.16667em}{0ex}}\text{if}\phantom{\rule{0.16667em}{0ex}}\text{the}\phantom{\rule{0.16667em}{0ex}}{i}^{th}\phantom{\rule{0.16667em}{0ex}}\text{subject}\phantom{\rule{0.16667em}{0ex}}\text{is}\phantom{\rule{0.16667em}{0ex}}\text{randomized}\phantom{\rule{0.16667em}{0ex}}\text{to}\phantom{\rule{0.16667em}{0ex}}\text{AZT}\hfill \\ 0\phantom{\rule{0.16667em}{0ex}}\text{if}\phantom{\rule{0.16667em}{0ex}}\text{the}\phantom{\rule{0.16667em}{0ex}}{i}^{th}\phantom{\rule{0.16667em}{0ex}}\text{subject}\phantom{\rule{0.16667em}{0ex}}\text{is}\phantom{\rule{0.16667em}{0ex}}\text{randomized}\phantom{\rule{0.16667em}{0ex}}\text{to}\phantom{\rule{0.16667em}{0ex}}\text{ddI}\hfill \end{array}.$$

Because of the stratified randomization, to control for baseline age we define the indicator variable

$${\text{age}}_{i}=\{\begin{array}{l}1\phantom{\rule{0.16667em}{0ex}}\text{if}\phantom{\rule{0.16667em}{0ex}}\text{the}\phantom{\rule{0.16667em}{0ex}}{i}^{th}\phantom{\rule{0.16667em}{0ex}}\text{subject}\phantom{\rule{0.16667em}{0ex}}\text{has}\phantom{\rule{0.16667em}{0ex}}\text{baseline}\phantom{\rule{0.16667em}{0ex}}\text{age}\ge 35\hfill \\ 0\phantom{\rule{0.16667em}{0ex}}\text{otherwise}\hfill \end{array}.$$

We model the logit of *p _{it}* =

$$\text{logit}({p}_{it})={\beta}_{0}+{\beta}_{1}{\text{age}}_{i}+{\beta}_{3}t+{\beta}_{4}t{\text{AZT}}_{i},$$

for *t* = 0, 1, …, 5. Note the exclusion of a main effect of treatment (AZT* _{i}*). The main effect of AZT corresponds to the baseline (

Recall that the protective pseudo-likelihood requires specification of the correlations, *ρ*_{1}* _{t}*. We estimated the parameters under both AR(1) and exchangeable correlations; the results were so similar that for simplicity, we present results under an exchangeable assumption. Further, recall that for the non-ignorable pseudo-likelihood under independence, we must model the probability of being observed at each time point. It was conjectured that CD4 count is nonignorably missing since sicker patients may not come in for a further GP visit, e.g., sicker patients may have been hospitalized. We considered the following missing data mechanism:

$$\begin{array}{l}\text{logit}({\pi}_{it})=\text{logit}[\text{pr}({R}_{it}=1{y}_{it},{x}_{it},\gamma )]& ={\gamma}_{0}+{\gamma}_{1}{y}_{it}+{\gamma}_{2}{\text{AZT}}_{i}+{\gamma}_{3}{\text{age}}_{i}+{\gamma}_{4}t+{\gamma}_{12}{y}_{it}{\text{AZT}}_{i}+{\gamma}_{14}{y}_{it}t,\end{array}$$

(6)

for *t >* 0. Using the pseudo-likelihood approach in (6), both the *y _{it}*AZT

Table II displays estimates and standard errors for the parameters *β* for all models and methods. Note that the WLS estimator of the *k ^{th}* element of

From Table II, we see that, among the non-ignorable approaches, the estimates are similar, except for the AGE effect using the protective estimate, which is over 50% smaller (and, as discussed above, also over 50% more variable). Comparing GEE to the non-ignorable approaches, we see that the GEE estimate of the time by treatment interaction is much smaller than the estimate using the non-ignorable approaches. This also highlights how different assumptions about the missing data mechanism can produce discernibly different, and possibly conflicting, estimates of effects.

We compared the WLS estimator, the protective estimator, the pseudo-likelihood estimator under independence, the ML estimator using the correct non-ignorable missingness mechanism, and GEE under an ignorable missing data mechanism. To ensure feasibility of the simulation study, we restricted the number of occasions to *T* = 3 and considered a simple two-group study design configuration (e.g., evenly randomized between active treatment and placebo).

Let *x _{i}* = 0, 1 indicate treatment group membership. The binary outcomes, denoted by (

$$\text{pr}({Y}_{i1}={y}_{i1},{Y}_{i2}={y}_{i2},{Y}_{i3}={y}_{i3}{x}_{it},\beta ,\alpha )=\{{t=13}_{{p}_{it}^{{y}_{it}}{(1-{p}_{it})}^{(1-{y}_{it})}}^{}\}\{1+{\rho}_{12}{z}_{i1}{z}_{i2}+{\rho}_{13}{z}_{i1}{z}_{i3}+{\rho}_{23}{z}_{i2}{z}_{i3}+{\rho}_{123}{z}_{i1}{z}_{i2}{z}_{i3}\},$$

where

$$\begin{array}{l}{Z}_{it}=\frac{{Y}_{it}-{p}_{it}}{\sqrt{{p}_{it}(1-{p}_{it})}},\\ {\rho}_{st}=\text{Corr}({Y}_{is},{Y}_{it})=\frac{E[({Y}_{is}-{p}_{is})({Y}_{it}-{p}_{it}){x}_{i}]\sqrt{{p}_{is}(1-{p}_{is}){p}_{it}(1-{p}_{it})}}{,}& {\rho}_{123}=\frac{E[({Y}_{i1}-{p}_{i1})({Y}_{i2}-{p}_{i2})({Y}_{i3}-{p}_{i3}){x}_{i}]\sqrt{{p}_{i1}(1-{p}_{i1}){p}_{i2}(1-{p}_{i2}){p}_{i3}(1-{p}_{i3})}}{,}& \text{logit}({p}_{it})={\beta}_{0}+{\beta}_{x}{x}_{i}+{\beta}_{\tau}(t-1),\end{array}$$

for *t* = 1, 2, 3. We group *α* = [*ρ*_{12}, *ρ*_{13}, *ρ*_{23}, *ρ*_{123}]′. For the simulation study, we choose *β*_{0} = −0.25, *β _{x}* = 0.5, and

We performed simulations with the following true non-ignorable missingness mechanism,

$$\text{logit}({\pi}_{it})=\text{logit}[\text{pr}({R}_{it}=1{y}_{it},{x}_{it},\gamma )]={\gamma}_{0}+{\gamma}_{1}{x}_{i}+{\gamma}_{2}(t-1)+{\gamma}_{3}{y}_{it},$$

(7)

for *t >* 1, and we let the missingness indicators be independent at the three occasions. For the simulation study, the true model parameters in (7) are *γ*_{0} = −0.5, *γ*_{1} = 1.0, *γ*_{2} = 0.2, and *γ*_{3} = 1.0. Here, missingness at a given occasion depends upon group membership, time, and the possibly missing outcome at that occasion. In this mechanism, non-monotone missingness can occur in that an outcome can be missing at time *s* (*R _{is}* = 0), but observed at a future time

$$\text{pr}[{R}_{i1}={r}_{i1},{R}_{i2}={r}_{i2},{R}_{i3}={r}_{i3}{y}_{i1},{y}_{i2},{y}_{i3},{x}_{it},\gamma ]=\underset{{\pi}_{it}^{{r}_{it}}{(1-{\pi}_{it})}^{(1-{r}_{it})}}{\overset{.}{t=23}}$$

In the simulations reported in Table III, all of the non-ignorable methods are approximately unbiased, whereas GEE is clearly biased. The main interest of this simulation is to explore the efficiency gains of WLS over the protective and pseudo-likelihood estimator under independence. We provide both the average of the estimated variance and the empirical simulation variance; in general they match closely, except for the protective estimator when the correlation is low and the variance is poorly estimated. We see that the WLS estimator displays considerable gains in efficiency over the protective estimator for both *β _{τ}* and

We have proposed a weighted least squares estimator (WLS) which is an optimal combination of * _{ind}* and

Because of the broad range of possible missing data configurations and underlying probability distributions generating the data, it is difficult to draw definitive conclusions from simulation studies, and we can make only general suggestions. Based on our simulation studies, however, we have shown that one can take two relatively inefficient estimators (the protective and pseudo-likelihood under independence), and create a highly efficient estimator in the WLS estimator.

The authors are grateful for constructive comments from two reviewers, and for the support provided by the following grants from the US National Institutes of Health: AI 60373, GM 29745, CA 74015, CA 70101, MH 054693, and CA 68484. Andrea Troxel gratefully acknowledges support from the Columbia University Institute for Scholars at Reid Hall, Paris. Geert Molenberghs gratefully acknowledges financial support from the Belgian Science Policy IAP research network #P6/03.

1. Cox DR. The analysis of multivariate binary data. Applied Statistics. 1972;21:113–20.

2. Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

3. Le Cessie, Van Houwelingen JC. Logistic regression for correlated binary data. Applied Statistics. 1994;43:95–108.

4. Meester SG, MacKay J. A parametric model for cluster correlated categorical data. Biometrics. 1994;50:954–963. [PubMed]

5. Molenberghs G, Lesaffre E. Marginal modelling of correlated ordinal data using a multivariate Plackett distribution. Journal of the American Statistical Association. 1994;89:633–644.

6. Little RJ, Rubin DB. Statistical Analysis with Missing Data. Wiley & Sons; New York: 1987.

7. Baker SG. Marginal regression for repeated binary data with outcomes subject to nonignorable nonresponse. Biometrics. 1995;51:1042–1052. [PubMed]

8. Baker SG, Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association. 1988;83:62–69.

9. Diggle P, Kenward MG. Informative drop-out in longitudinal analysis (with discussion) Applied Statistics. 1994;43:49–93.

10. Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55:688–698. [PubMed]

11. Ibrahim JG, Chen MH, Lipsitz SR. Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika. 2001;88:551–564.

12. Kahn JO, Lagakos SW, Richman DD. AIDS Clinical Trials Group. A controlled trial comparing continued zidovudine with didanosine in human immunodeficiency virus infection. New England Journal of Medicine. 1992;327:581–7. [PubMed]

13. Gallant JE, Moore RD, Richman DD, Keruly J, Chaisson RE. Zidovudine Epidemiology Study Group. Incidence and natural history of cytomegalovirus disease in patients with advanced human immunodeficiency virus disease treated with zidovudine. Journal of Infectious Diseases. 1992;166:1223–7. [PubMed]

14. Finkelstien DM, Williams PL, Molenberghs G, Feinberg J, Powderly WG, Kahn J, Dolin R, Cotton D. Patterns of opportunistic infections in patients with HIV infection. Journal of Acquired Immune Deficiency Syndromes & Human Retrovirology. 1996;12:38–45. [PubMed]

15. Phair J, Munoz A, Detels R, Kaslow R, Rinaldo C, Saah A. the Multicenter AIDS Cohort Study Group. The risk of *Pneumocystis carinii* pneumonia among men infection with human immunodeficiency virus type 1. New England Journal of Medicine. 1990;332:161–5. [PubMed]

16. Fitzmaurice G, Molenberghs G, Lipsitz SR. Regression models for longitudinal binary responses with informative drop-outs. Journal of the Royal Statistical Society, Series B. 1996;57:691–704.

17. Gill R, Robins JM. Sequential models for coarsening and missingness. In: Lin DY, Fleming TR, editors. Proceedings of the First Seattle Symposium on Bisotatistics: Survival Analysis. Springer-Verlag; New York: 1997. pp. 295–305.

18. Robins JM, Gill R. Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine. 1997;16:39–56. [PubMed]

19. Gong G, Samaniego F. Pseudo maximum likelihood estimation: theory and applications. Annals of Statistics. 1981;9:861–869.

20. Liang K-Y, Self SG. On the asymptotic behavior of the pseudolikelihood ratio test statistic. Journal of the Royal Statistical Society, Series B. 1996;58:785–796.

21. Troxel AB, Lipsitz SR, Harrington DP. Marginal models for the analysis of longitudinal measurements subject to nonignorable non-monotone missing data. Biometrika. 1998;85:661–672.

22. Fitzmaurice G, Lipsitz SR, Molenberghs G, Ibrahim JG. A protective estimator for longitudinal binary data subject to non-ignorable non-monotone missingness. Journal of the Royal Statistical Society, Series A. 2005;168:723–735.

23. White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–26.

24. Brown CH. Protecting against nonrandomly missing data in longitudinal studies. Biometrics. 1990;46:143–155. [PubMed]

25. Wei LJ, Johnson WE. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72:359–364.

26. Bloch DA, Moses LE. Nonoptimally weighted least squares. The American Statistician. 1988;42:50–53.

27. Schabenberger O. Introducing the GLIMMIX procedure for generalized linear mixed models. Proceedings of the Thirtieth Annual SAS Users Group International Conference; Cary, NC: SAS Institute Inc; 2005. Paper 196–30.

28. Plackett RM. A class of bivariate distributions. Journal of the American Statistical Association. 1965;60:526–22.

29. Bahadur RR. A representation of the joint distribution of responses to n dichotomous items. In: Solomon H, editor. Studies in Item Analysis and Prediction. Stanford University Press; 1961. pp. 158–68.

30. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the *EM* algorithm (with discussion) Journal of the Royal Statistical Society, Series B. 1977;39:1–38.

31. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B. 1983;45:212–18.

32. Nordheim EV. Inference from nonrandomly missing categorical data: an example from a genetic study in Turner’s syndrome. Journal of the American Statistical Association. 1984;79:772–80.

33. Scharfstein D, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semiparametric non-response models (with discussion) Journal of the American Statistical Association. 1999;94:1096–1146.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |