Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2791415

Formats

Article sections

- Summary
- 1. Introduction
- 2. Model
- 3. Computational Details
- 4. Example
- 5. Simulation Study
- 6. Discussion
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2009 December 10.

Published in final edited form as:

Published online 2007 September 26. doi: 10.1111/j.1541-0420.2007.00884.x

PMCID: PMC2791415

NIHMSID: NIHMS143240

Jason Roy, Center for Health Research, Geisinger Health System, Danville, Pennsylvania 17822, U.S.A. Email: ude.regnisieg@yoraj;

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.

Dropout is a common occurrence in longitudinal studies. Missingness induced by dropout that depends only on the observed data is called missing at random (MAR) or random dropout. If missingness depends on the unobserved response at the time of dropout or at future times, even after conditioning on the observed data, then the missingness is called nonignorable or informative dropout (Little, 1995). There are many model-based approaches to deal with informative dropout that are characterized by how they factor the joint distribution of missingness and the response. We will focus on the pattern mixture approach. Pattern mixture models (PMM) are a flexible and transparent way to analyze incomplete longitudinal data where the missingness is nonignorable (Little, 1994; Hogan and Laird, 1997). The typical approach taken in PMM is to stratify on dropout time (i.e., the pattern) and assume that missing data within a pattern are MAR. Consider the case of *T* unique dropout times and define *D _{i}* to be the dropout time and

The other issue we will address are situations where the number of unique dropout times *T* is large. In this setting stratification by dropout pattern may lead to sparse patterns, which will lead to unstable parameter estimates (or unidentified parameters) in those patterns. There are several ways to remedy this including allowing parameters to be shared across patterns (Hogan and Laird, 1997) or to group the dropout times into *m*<*T* groups in an ad hoc fashion (Hogan, Roy and Korkontzelou, 2004). Roy (2003) proposed an automated way to do the latter using a latent variable approach within the context of normal models for continuous data. This approach assumes the existence of a discrete latent variable that explains the dependence between the response vector and the dropout time and allows incorporation of uncertainty about the groupings, conditional on a fixed number of groups. We will extend the approach of Roy (2003) by incorporating uncertainty in the number of classes through (approximate) Bayesian model averaging.

A common way to account for the longitudinal correlation in the vector of responses for subject *i*, *Y*_{i} is to introduce random effects. However, for nonlinear link functions, similar to the above discussion, the link no longer holds for marginal covariate effects (Diggle et al., 2002). We will use the ideas in Heagerty (1999) within our model to directly model the marginal covariate effects. We briefly review Heagerty's approach below.

Let *Y _{it}* denote the response for the

$$\text{logit}\left\{P({Y}_{it}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\beta )\right\}={X}_{it}^{T}\beta .$$

(1)

Then the dependence among the *Y _{it}* is specified via a conditional model that is consistent with (1),

$$\text{logit}\left\{P({Y}_{it}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i})\right\}={\Delta}_{it}+{b}_{i},$$

(2)

where *b _{i}* ~

$$P({Y}_{it}=1)=\int P({Y}_{it}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i})dF\left({b}_{i}\right).$$

Note that Δ* _{it}* is a function of

This work is widely applicable, but was motivated by an HIV natural history study of depression. The HIV Epidemiology Research Study (HERS; Smith et al., 1997) was a longitudinal study of women with, or at high risk for, HIV infection. Data were collected from 1310 women at baseline. Investigators then attempted to collect data from each subject every 6 months for a total of 6 years. Thus, 12 total visits from each subject would be obtained if there were no missing data. Our interest is in studying the course of depression in the 849 women who had HIV infection at baseline. Depression was treated as a binary, yes/no, variable (Cook et al., 2004). A challenge with the analysis of these data is that less than half of these women remained in the study until the end. It is not hard to imagine a scenario where the course of depression over time might vary as a function of dropout time. Because there are many unique dropout times (12), some of which include very few subjects, we apply the latent class pattern mixture modeling approach to the analysis of these data.

In Section 2 we introduce the model. We provide computational details in Section 3. The example is analyzed in Section 4. A brief simulation study is given in Section 5. We conclude with a discussion in Section 6.

Before we introduce the model, we first go through some additional notation needed for the latent class component. Define *S _{i}* = (

All of the parameters in the following specification are a function of the number of latent classes, *M*; for example, *β*^{(M)}. However, we suppress the superscripts without loss of clarity in the following. First, we specify the marginal mean as

$$g\left\{\mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\beta )\right\}={X}_{it}^{T}\beta .$$

(3)

By marginal, we mean marginalized over subject-specific random effects *and* over the latent class distribution (implicitly over the dropout distribution as well). If the number of classes *M* were known, then the parameters *β* would be of primary interest. We address the issue of *M* being unknown below.

In order to fully account for correlation due to repeated observations and informative censoring, we specify a conditional model in addition to the marginal model. Recall that we are taking a pattern mixture modeling approach to account for dropout. We assume that the relevant information in *D* is captured by the latent variable *S*. We therefore specify a mixture distribution over these latent classes, as opposed to over *D* itself. Before proceeding to describe the model, however, we first make two points. First, the parameters from the conditional model are not of scientific interest, and in fact are viewed as nuisance parameters; we are not interested in estimating subject-specific effects (i.e., effects conditional on the random effects) or class-specific covariate effects (i.e., effects of covariates on *Y* given a particular dropout class). Second, we must specify the conditional model in a way that is compatible with the marginal model (3). As we will see below, this leads to a somewhat complicated model. Specifying this conditional model is necessary, however, in order to account for the two types of dependencies (within-subject correlation and dependency between the outcome and dropout time).

We assume the data *Y _{it}*, conditional on random effects

$$f({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i},{S}_{i})=\text{exp}[\{{Y}_{it}{\eta}_{it}-\psi \left({\eta}_{it}\right)\}\u2215\left({m}_{i}\varphi \right)+h({Y}_{it},\varphi )],$$

where E(*Y _{it}*|

$$g\left\{\mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i},{S}_{i})\right\}={\Delta}_{it}+{b}_{i}+\sum _{j=1}^{M}{S}_{ij}{Z}_{it}^{T}{\alpha}^{\left(j\right)},$$

(4)

where, in the most general form of the model we allow the variance of *b _{i}* to depend on the latent class, that is, [

The probabilities of the latent classes given the dropout time are specified as a proportional odds model,

$$\begin{array}{cc}\hfill \text{logit}\left\{P\left(\sum _{j=1}^{k}{S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i}\right)\right\}& ={\lambda}_{0k}+{\lambda}_{1}{D}_{i},\hfill \\ \hfill k& =1,\dots ,M-1,\hfill \end{array}$$

(5)

where *λ*_{01} ≤ *λ*_{02} ≤ ··· ≤ λ_{0,M–1} and *λ*_{1} are unknown parameters. From this regression (5) it is clear that the class probabilities are a monotone function of dropout time (in fact, linear on the logit scale). Finally, the dropout times, *D _{i}*, follow a multinomial distribution with mass at each of the possible dropout times, parameterized by

We point out that in the above formulation, *Y _{it}* is independent of

The intercept Δ* _{it}* in (4) is determined by the relationship between (3) and (4), namely, the solution to

$$\mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\beta )=\sum _{D}\sum _{S}p({S}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i})p\left({D}_{i}\right)\int \mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i},{S}_{i})p({b}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{i})\phantom{\rule{thinmathspace}{0ex}}d{b}_{i}.$$

The main target of inference typically will be the covariate effects averaged over classes, that is, *β*^{(M)} averaged over *M*. We denote this as *β** = ∑_{m }*β*^{(m) }*p*(*m*|*y*). We discuss computation of *p*(*m*|*y*) in Section 3.3 and the corresponding computation of $\text{var}\left({\widehat{\beta}}^{\star}\right)$.

We provide details on computation of maximum likelihood (ML) estimates conditional on *m*, computation of the approximate posterior model probabilities, and model averaging.

Denote the set of all parameters by *ω* = (*β ^{T}*,

The likelihood contribution for subject *i* corresponding to the models described in Section 2 is

$$\begin{array}{cc}\hfill & {L}_{i}({Y}_{i},{D}_{i};\omega )\propto \int \sum _{j=1}^{M}{L}_{i}({Y}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij}=1,{b}_{i};{\alpha}^{\left(j\right)},\varphi )\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\times p({S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i};\lambda )p({D}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\gamma )\phantom{\rule{thinmathspace}{0ex}}dF({b}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij},{\theta}_{j}),\hfill \end{array}$$

(6)

where

$$\begin{array}{cc}\hfill & {L}_{i}({Y}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij}=1,{b}_{i};{\alpha}^{\left(j\right)},\varphi )\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}=\text{exp}[\{{Y}_{i}^{T}{\eta}_{i}-\psi \left({\eta}_{i}\right)\}\u2215\left({m}_{i}\varphi \right)+{1}^{T}h({Y}_{i},\varphi )],\hfill \end{array}$$

with ${\eta}_{i}={\Delta}_{i}+{b}_{i}1+{\sum}_{j=1}^{M}{S}_{ij}{Z}_{i}{\alpha}^{\left(j\right)},p({S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i};\lambda )$ is defined in (5), and *p*(*D _{i}*|

Maximization of $\text{log}\left\{{\prod}_{i=1}^{n}{L}_{i}({Y}_{i},{D}_{i};\omega )\right\}$ with respect to the parameters *ω* is complicated by the possibly intractable integral in (6), and the need to calculate Δ_{it} at each iteration in the algorithm for every record in the data set. We provide details of the maximization algorithm in the Appendix.

The models introduced in Section 2 are indexed by the number of latent classes *m* (*m* = 1,..., *M*, *M* < *T*). Given that our main interest is in the regression parameters *β*, it would be sensible to properly account for the uncertainty in the regression coefficients by averaging over the number of classes as opposed to conditioning the most likely number of classes. To do this, we need to first specify a prior distribution on the number of latent classes, *m*. We recommend specifying a prior to favor parsimony and/or to be consistent with subject matter considerations (if available). A convenient specification is a truncated Poisson distribution with rate parameter, *μ*, and truncated at an integer between 1 and *T*. Denote this prior as *p*(*m*). The posterior probability of *m* classes is given by the expression,

$$p(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y,x)=\frac{p(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x)p\left(m\right)}{p(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}x)},$$

where *p*(*y*|*x*) = ∑_{m }*p*(*y*|*m*, *x*)*p*(*m*) and *p*(*y*|*m*, *x*) are the integrated likelihood, that is,

$$\begin{array}{cc}\hfill p(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x)=& \int p(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x,{\beta}^{\left(m\right)},{\alpha}^{\left(m\right)},\lambda ,\gamma ,\theta )p\left(\lambda \right)\hfill \\ \hfill & \times p({\alpha}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m)p({\beta}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m)p\left(\gamma \right)\hfill \\ \hfill & \times p\left(\theta \right)\phantom{\rule{thinmathspace}{0ex}}d{\beta}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}d{\alpha}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}d\lambda \phantom{\rule{thinmathspace}{0ex}}d\gamma \phantom{\rule{thinmathspace}{0ex}}d\theta ,\hfill \end{array}$$

where *p*(*y*|*m*, *x*, *β*^{(m)}, *α*^{(m)}, *λ*, *γ*, *θ*) = ∑_{s} ∑_{D }*p*(*y*|*m*, *x*, *β*^{(m)}, *α*^{(m)}, *θ*)*p*(*S*|*m*, *x*, *D*, *λ*)*p*(*D*|*m*, *x*, *γ*). Unfortunately, this integral is not available in closed form. We propose to use a Laplace approximation to evaluate this integral,

$$\widehat{p}(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x)={\left(2\pi \right)}^{d\u22152}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\widehat{\Sigma}\phantom{\rule{thinmathspace}{0ex}}\mid {\phantom{\rule{thinmathspace}{0ex}}}^{1\u22152}p(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x,{\widehat{\beta}}^{\left(m\right)},{\widehat{\alpha}}^{\left(m\right)},\widehat{\lambda},\widehat{\gamma},\widehat{\theta}),$$

(7)

where *d* = dim(*β,α,λ,γ,θ*) and $({\widehat{\beta}}^{\left(m\right)},{\widehat{\alpha}}^{\left(m\right)},{\widehat{\lambda}}^{\left(m\right)},{\widehat{\gamma}}^{\left(m\right)},{\widehat{\theta}}^{\left(m\right)})$ are the joint ML estimates of *β*^{(m)}, *α*^{(m)}, *λ*^{(m)}, *γ*^{(m)}, *θ*^{(m)}) for the model with *m* classes, *p*(*y*|*m*, *x*, ${\widehat{\beta}}^{\left(m\right)}$, ${\widehat{\alpha}}^{\left(m\right)}$, $\widehat{\lambda}$, $\widehat{\gamma}$, $\widehat{\theta}$) is the value of the maximized integrated likelihood, and $\widehat{\Sigma}$ is the inverse of the observed information matrix for (*β*, *α*, *λ*, *γ*, *θ*) based on the integrated likelihood (6). These estimates are obtained using the algorithm described in the Appendix. It is clear that in (7) we have ignored the contribution of the prior, *p*(*λ*)*p*(*α*^{(m)}|*m*)*p*(*β*^{(m)}|*m*)*p*(*γ*)*p*(*θ*), evaluated at the joint ML estimates. This is justified (asymptotically) because the maximized likelihood term, *p*(*y*|*m*, *x*, ${\widehat{\beta}}^{\left(m\right)}$, ${\widehat{\alpha}}^{\left(m\right)}$, $\widehat{\lambda}$, $\widehat{\gamma}$, $\widehat{\theta}$) is *O _{p}*(

$$\widehat{p}(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y,x)=\frac{\widehat{p}(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x)p\left(m\right)}{\sum _{m}\widehat{p}(y\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m,x)p\left(m\right)}.$$

(8)

Once the posterior distribution *p*(*m*|*y*) is estimated, we can then estimate the covariate effects averaged across class sizes. As described previously, we denote the average covariate effect over classes as *β**, which can be estimated as ${\widehat{\beta}}^{\star}={\sum}_{m}{\widehat{\beta}}^{\left(m\right)}\widehat{p}(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y)$. The variation of ${\widehat{\beta}}^{\ast}$ is

$$\begin{array}{cc}\hfill \text{var}\left({\widehat{\beta}}^{\ast}\right)& =\mathrm{E}\left[\text{var}({\widehat{\beta}}^{\ast}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}M)\right]+\text{var}\left(\mathrm{E}[{\widehat{\beta}}^{\ast}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}M]\right)\hfill \\ \hfill & =\sum _{m}\text{var}({\widehat{\beta}}^{\ast}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m)p(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y)+\text{var}\left(\mathrm{E}[{\widehat{\beta}}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}M]\right),\hfill \end{array}$$

which can be estimated as

$$\begin{array}{cc}\hfill \widehat{\text{var}}\left({\widehat{\beta}}^{\star}\right)=& \sum _{m}\text{var}({\widehat{\beta}}^{\left(m\right)}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}m)\widehat{p}(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y)\hfill \\ \hfill & +\sum _{m}{({\widehat{\beta}}^{\left(m\right)}-{\widehat{\beta}}^{\ast})}^{\otimes 2}\widehat{p}(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y).\hfill \end{array}$$

Note that if we conditioned the most likely value for the number of classes, *m*, the variance of the estimated regression coefficients would likely be too small due to ignoring the second term in the variance expression above.

Conditional independence between *Y* and *D* given *S* and *X* is a key assumption with this modeling approach. A simple method for checking the conditional independence assumption for a given class size *M* is as follows: this approach was originally proposed by Lin, McCulloch, and Rosenheck (2004), as a modification to the test proposed by Bandeen-Roche et al. (1997). The goal is to test the null hypothesis that model (4) holds versus the alternative that the true model is

$$\begin{array}{cc}\hfill & g\left\{\mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i},{S}_{i},{D}_{i})\right\}\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{thickmathspace}{0ex}}={\Delta}_{it}+{b}_{i}+\sum _{j=1}^{M}{S}_{ij}{Z}_{it}^{T}{\alpha}^{\left(j\right)}+\sum _{j=1}^{J}{h}_{j}\left({D}_{i}\right){\varphi}_{j},\hfill \end{array}$$

(9)

where each *h _{j}*(·) is a known function and the 's are parameters. The null hypothesis is that

First, fit the null model (3). We can then estimate the posterior probability of class membership for each subject as

$$\begin{array}{cc}\hfill & \widehat{P}({S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i},{Y}_{i},{X}_{i};\widehat{\omega})\hfill \\ \hfill & =\frac{\int {L}_{i}({Y}_{obs,i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij}=1,{b}_{i};{\widehat{\alpha}}^{\left(j\right)},\widehat{\varphi})p({S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i};\widehat{\lambda})p({D}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}\widehat{\gamma})dF({b}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij},{\widehat{\theta}}_{j})}{{L}_{i}({Y}_{i},{D}_{i};\widehat{\omega})},\hfill \end{array}$$

where *L _{i}*(

As briefly described in the Introduction, we were interested in analyzing data on the longitudinal course of depression of 850 HIV-infected women from the HERS. Depression was measured using the Center for Epidemiologic Studies Depression Scale (CES-D). The CES-D includes 20 questions related to mood, each of which can take a value from 0 (symptom rarely present) to 3 (symptom almost always present). Larger scores indicate the presence of more symptoms, and scores range from 0 to 60. A score of 16 or greater is frequently used as a depression cutoff (e.g., Cook et al., 2004). We therefore defined our outcome *Y _{it}* as the indicator of depression at visit

The observed proportion of depression decreased over time. However, the sample mean is only a valid estimate of the prevalence at each visit if the missing data were missing completely at random (MCAR); it would not be surprising if depression status was related to dropout. There was a substantial amount of dropout. By visit 12, less than half of the original sample remained in the study. We would like to account for the possibility that the prevalence of depression over time might be related to the dropout time.

We first fitted a marginally specified logistic regression model under the MAR assumption. This could also be thought of as a special case of the proposed latent class model, but with *M* = 1 class. We assumed models (1) and (2) hold, where the covariate vector includes an intercept, indicator of black race (black), an indicator of Hispanic ethnicity (latina), an indicator of other race/ethnicity (other), number of HIV-related symptoms during the 6 months prior to the baseline visit (symptoms), an indicator that the subject has been an IDU, number of adverse events in 6 months prior to the baseline visit (adverse), and the HERS visit number (visit). Only visit was a time-varying covariate. White race was the reference category for the race/ethnicity variable.

We next fitted models (3–5), with *M* equal to classes 2, 3, and 4. The covariate vector *X _{it}* was the same as used in the previous model. We also set

The results are given in Tables 1 and and2.2. In Table 1, we compared the four models based on the components of the Laplace approximation of the marginal distribution (7) and the corresponding approximate posterior distribution of the number of classes. First, we examined the maximized likelihood, $p(y\phantom{\rule{thickmathspace}{0ex}}\mid \phantom{\rule{thickmathspace}{0ex}}\widehat{\omega})$. There was a substantial increase in the likelihood (relative to the increase in the number of parameters) by going from 1 to 2 classes. Similarly, there was a modest gain in the likelihood by going from 2 to 3 classes. The likelihood for the four-class model was almost identical to that in the three-class model. The four-class model provided essentially the same fit as the three-class model, but with nine extra parameters. Besides the maximized likelihood, the term (*d/*2)log(2*π*) always increases as the number of parameters (d) increases. However, the determinant of the estimated covariance matrix, $\mid \widehat{\Sigma}\mid $ typically decreases as the number of parameters increases; this acts as a “penalty” term for adding parameters. In particular, consider the comparison between models 3 and 4. In model 4 we added nine new parameters. These parameters did little to improve the fit to the data, as the likelihood only increased by a small amount. These parameters were not well identified by the model, and tended to have large variances and high correlation with other parameters. This caused the determinant of the estimated covariance matrix to be considerably smaller than from the three-class model.

The components of the Laplace approximation to the marginal likelihood and the corresponding approximate posterior model probabilities under two priors for the number of classes: a discrete uniform prior and a truncated Poisson prior

Estimates and standard errors of marginal coefficients β^{(m)}. The estimated covariate effects averaged over classes, β*, *are also given in the column for M* = 3.

The posterior distribution of *M* was insensitive to the choice of the prior (*p*(*M* = 3|*y*, *x*) = 0.9997 with the uniform prior, and *p*(*M* = 3|*y*, *x*) = 0.9987 with the truncated Poisson prior). The three-class model was the clear “winner” based on the posterior model probabilities; no reasonable prior would change this conclusion. Due to the closeness of the posterior probability of the three-class model to 1, there was no need to carry out the model averaging. In particular, recall that ${\widehat{\beta}}^{\star}={\sum}_{m}{\widehat{\beta}}^{\left(m\right)}\widehat{p}(m\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}y)$. Because, from Table 1, (*M* = 3|*y*) = 1, then the estimated parameters from the three-class model, ${\widehat{\beta}}^{\left(3\right)}$, were equivalent to the estimated parameters that were averaged over the number of classes ${\widehat{\beta}}^{\ast}$.

The marginal regression coefficient estimates are presented in Table 2 for each model. The parameter estimates from the one-class model were quite different from the models with multiple classes. For example, based on the one-class model, we might conclude that the prevalence of depression was lower for blacks. However, once we account for dropout using the latent class model, we conclude the opposite.

Because the posterior probabilities overwhelmingly favored the three-class model, we will now focus on this model for our conclusions. Blacks, Latinas, and other non-white racial and ethnic groups were estimated to have a significantly higher prevalence of depression as compared with whites. IDU, the number of adverse events, and HIV-related symptoms were associated with higher prevalence of depression. There was a significant, but somewhat gradual, decline in depression over time. We also considered interactions between race/ethnicity and visit number, but these interactions did not appear to be important in describing the data.

Table 3 displays estimated latent class probabilities as a function of dropout time, using the estimated values of *λ*, the ordinal regression parameters in (5). Individuals who dropped out early (after visit 1) were very likely to be in class 1. Individuals who remained in the study until the end were most likely to be in class 2. Class 3 consisted of a small subpopulation of the subjects who dropped out in the final few visits of the study.

We used the method described in Section 3.4 to test the null hypothesis of conditional independence. For each value of *M* (1–4), we fitted model (9), with ${\sum}_{j=1}^{J}{h}_{j}\left({D}_{i}\right){\varphi}_{j}={D}_{i}\varphi $. The test statistic, which, under the null hypothesis follows an approximate ${\chi}_{1}^{2}$ distribution, had values of 7.81, 2.64, 0.41, and 0.41 for *M* = 1 to *M* = 4, respectively. Thus, with respect to the specific alternative of a linear effect of dropout time, the conditional independence assumption appeared to be reasonable for *M* = 3.

We carried out a brief simulation study, primarily to examine the effectiveness of the approximation to fully Bayesian inference. For covariates, we used variables from the HIV data described in the previous section. In particular, the *X* matrix included an intercept, the indicator of IDU, and visit number. The true values of the *β* parameters were –1.1, 0.45, and –0.02 for the intercept, IDU, and visit, respectively.

We first generated the response for the case where *M* = 1 (where the MAR assumption holds). The missing data pattern was just the observed pattern from the HIV data. The response was generated from models (1) and (2). We also generated data for the case where *M* = 2. The missing data pattern was the same, but now the response depended on class membership. The latent class variable was generated from model (5) with *λ*_{01} = 4 and *λ*_{1} = –0.7. We then set *α*^{(1)} = (0.003, – 0.16, 0.24)* ^{T}* in (4) and generated the response. In each case, the variance of the random intercept was

For each generated data set, we fitted a marginally specified logistic regression model under the MAR assumption (*M* = 1). We also fitted the latent class model proposed in the manuscript. In that case, we fitted a one-, two-, and three-class model, and carried out model averaging assuming a discrete uniform prior over the three classes. One hundred simulated data sets were analyzed under each scenario. The percentage bias, average estimated standard error (SE), the estimated standard deviation of the estimates (ESD), as well as coverage probability were recorded. For model averaging, ${\widehat{\beta}}^{\ast}$ was reported. The results are given in Table 4.

Results from simulation study. Percentage bias, average of the estimated standard errors (SE), empirical standard deviation (ESD), and 95% coverage probabilities are reported for the estimated marginal regression coefficients.

When the data were generated under the MAR assumption (*M* = 1), both modeling approaches worked reasonably well. The estimates had very little bias. The SEs tended to be slightly underestimated. Coverage was below the nominal for the intercept. We did not expect the coverage and ERs to be exact as we used large sample results for inference here.

When data were generated from the two-class model (MAR assumption violated), the model that relied on the MAR assumption (*M* = 1) no longer performed well. In general, coverage probabilities were too low. In particular, the estimated coefficient of visit number had a large negative bias (582%) and no coverage. The model averaging approach yielded better results. The coefficient of visit number had negative bias (26%) with coverage probability of 0.91. The bias comes from putting some weight on the incorrect model (MAR); the coefficient of visit number conditional on *M* = 2 had a bias of just 3%.

For data generated from the one-class model (MAR), the one-class model had the highest posterior probability in 44% of samples. Here, the two-class model was slightly favored, which is only an incorrect model in the sense that it has more parameters than necessary. For data generated from the two-class model, the two-class model had the highest posterior probability in 81% of samples. The one-class model (MAR) only had the highest probability in 2% of samples.

To confirm that the model probabilities would converge to the correct values as the sample size increased, we simulated data from the same model as described above, but with a sample size of 3400 (4 copies of the covariate data from 850 subjects were used). We fitted five simulated data sets from the one-class model (MAR) and from the two-class model. In each case, the posterior model probability for the correct *M* was greater than 0.99.

We have proposed a new model for dealing with nonignorable missing data that parsimoniously addresses data sets with many possible dropout times (in an automated fashion) and directly models the marginal covariate effects of interest. Via approximate posterior model probabilities for the number of latent classes, this approach properly takes into account uncertainty in the unknown number of classes.

We fitted the model using approximate Bayesian methods. Reversible jump Markov chain Monte Carlo methods (Green, 1995) would be required to fit a fully Bayes model because the dimension of the parameter space changes with the number of latent classes.

For the model proposed here, we have assumed a simple within-class longitudinal dependence structure through the introduction of a random intercept. More flexible specifications of the dependence structure could be obtained by replacing the scalar random effect *b _{i}* with a set of correlated random effects

Alternative methods for specifying marginal effects for correlated binary data have been proposed. Caffo, An, and Rohde (2006) proposed a model for binary data with random effects, which uses mixtures of normals. Their approach is less computationally intensive than the Heagerty (1999) approach that we implemented here. However, extending their approach to also average over the discrete latent dropout distribution would likely prove challenging. In particular, the additional step of averaging over the latent dropout classes would make it difficult to preserve the marginal probit interpretation. Wang and Louis (2003) proposed a bridge distribution function for binary random intercept models. However, extending their approach to our setting would likely pose similar problems for mixture of normals approach of Caffo et al. (2006).

The model proposed here assumes conditional independence between the outcome and dropout processes, given the latent class and covariates. We tested this assumption against a very simple alternative hypothesis (linear effect of dropout time). A more complicated approach would be to leave the functional form of the dependence unspecified. Specifically, we could assume

$$g\left\{\mathrm{E}({Y}_{it}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{b}_{i},{S}_{i},{D}_{i})\right\}={\Delta}_{it}+{b}_{i}+\sum _{j=1}^{M}{S}_{ij}{Z}_{it}^{T}{\alpha}^{\left(j\right)}+f({D}_{i)},$$

where *f*(·) is a smooth, but otherwise unspecified function. The null hypothesis of conditional independence would be *f*(*D _{i}*) = 0. We plan to explore a score-type test similar to that proposed by Zhang and Lin (2003) and Lin, Zhang, and Davidian (2006) and examine its asymptotic distribution for the models proposed here.

Dr Roy's research was supported by National Institutes of Health grant R01-HL-79457. Dr Daniels received National Institutes of Health grants R01-HL-79457 and R01-CA-85295. Data from the HER Study were collected under grant U64-CCU10675 from the U.S. Centers for Disease Control and Prevention. The authors thank the associate editor and a referee for their helpful comments and suggestions.

We propose the following approach to compute the ML estimates. First, we obtain initial values of the parameters. Initial values of *β* and *θ* could be obtained from ML estimates of a model that assumes an ignorable missing data mechanism. The parameters *λ* initially should be selected in a way that leads to marginal probabilities not too close to zero for any latent class. Initial values of *α* could be obtained by fitting a pattern mixture model with *M* groups of dropout times that have fixed boundaries. Given the data and parameters *ω*, we next calculate Δ* _{it}* for all

$$\begin{array}{cc}\hfill & h\left({\Delta}_{it}\right)\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}=\sum _{d=1}^{T}\sum _{j=1}^{M}\left\{\int {g}^{-1}({\Delta}_{it}+{b}_{i}+{Z}_{it}^{T}{\alpha}^{\left(j\right)})p({b}_{i}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{S}_{ij}=1)\phantom{\rule{thinmathspace}{0ex}}d{b}_{i}\right\}\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\times p({S}_{ij}=1\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{D}_{i}=d)p({D}_{i}=d)\hfill \end{array}$$

and *p*(*b _{i}*|

Jason Roy, Center for Health Research, Geisinger Health System, Danville, Pennsylvania 17822, U.S.A. Email: ude.regnisieg@yoraj.

Michael J. Daniels, Departments of Epidemiology and Biostatistics and Statistics, University of Florida, Gainesville, Florida 32611-8545, U.S.A. Email: ude.lfu.tats@sleinadm.

- Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ. Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association. 1997;92:1375–1386.
- Caffo B, An M-W, Rohde CA. A flexible general class of marginal and conditional random intercept models for binary outcomes using mixtures of normals. Johns Hopkins University; 2006. Available at: http://www.bepress.com/jhubiostat/paper98. Department of Biostatistics Working Papers. Working Paper 98.
- Cook JA, Grey D, Burke J, Cohen MH, Gurtman AC, Richardson JL, Wilson TE, Young MA, Hessol NA. Depressive symptoms and AIDS-related mortality among a multisite cohort of HIV-positive women. American Journal of Public Health. 2004;94:1133–1140. [PubMed]
- Diggle PJ, Heagerty PJ, Liang K-Y, Zeger SL. The Analysis of Longitudinal Data. 2nd edition Oxford University Press; New York: 2002.
- Fitzmaurice GM, Laird NM, Shneyer L. An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable dropouts. Statistics in Medicine. 2001;20:1009–1021. [PubMed]
- Green PJ. Reversible jump MCMC computation and Bayesian model determination. Biometrika. 1995;82:711–732.
- Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55:688–698. [PubMed]
- Heagerty PJ. Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics. 2002;58:342–351. [PubMed]
- Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997;16:239–257. [PubMed]
- Hogan JW, Roy J, Korkontzelou C. Handling dropout in longitudinal data. Statistics in Medicine. 2004;23:1455–1497. [PubMed]
- Lin H, McCulloch CE, Rosenheck RA. Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics. 2004;60:295–305. [PubMed]
- Lin J, Zhang D, Davidian M. Smoothing spline-based score tests for proportional hazards models. Biometrics. 2006;62:803–812. [PMC free article] [PubMed]
- Little RJA. A class of pattern mixture models for normal missing data. Biometrika. 1994;81:471–483.
- Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90:1112–1121.
- Roy J. Modeling longitudinal data with non-ignorable dropouts using a latent dropout class model. Biometrics. 2003;59:829–836. [PubMed]
- Smith DK, Warren DL, Vlahov D, Schuman P, Stein MD, Greenberg BL, Holmberg SD. Design and baseline participant characteristics of human immunodeficiency virus epidemiology research (HER) study: A prospective cohort study of human immunodeficiency virus infection in US women. American Journal of Epidemiology. 1997;146:459–469. [PubMed]
- Wang Z, Louis T. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90:765–775.
- Wilkins KJ, Fitzmaurice GM. A hybrid model for nonignorable dropout in longitudinal binary responses. Biometrics. 2006;62:168–176. [PubMed]
- Zhang D, Lin X. Hypothesis testing in semiparametric additive mixed models. Biostatistics. 2003;4:57–74. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |