Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2879593

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Two-Latent-Class Model
- 3. Maximum Likelihood Estimation
- 4. Shared Parameter Model and Weighted GEE Model
- 5. Simulation Results
- 6. Application to the Smoking Cessation Study
- 7. Diagnostics for the Two-Latent-Class Model
- 8. Discussion
- References

Authors

Related links

Commun Stat Theory Methods. Author manuscript; available in PMC 2010 June 2.

Published in final edited form as:

Commun Stat Theory Methods. 2009 September 1; 38(15): 2604–2619.

doi: 10.1080/03610920802585849PMCID: PMC2879593

NIHMSID: NIHMS205831

Address correspondence to Li Qin, Center for Research on Health Care, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Email: ude.cmpu@lniq

Non ignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. In our study, we propose a new two-latent-class model for categorical data with informative dropouts, dividing the observed data into two latent classes; one class in which the outcomes are deterministic and a second one in which the outcomes can be modeled using logistic regression. In the model, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Parameters are estimated by the method of maximum likelihood estimation based on the above assumptions and the tetrachoric correlation between responses within the same subject. We compare the proposed method with the shared parameter model and the weighted GEE model using the areas under the ROC curves in the simulations and the application to the smoking cessation data set. The simulation results indicate that the proposed two-latent-class model performs well under different missing procedures. The application results show that our proposed method is better than the shared parameter model and the weighted GEE model.

Missing data is a common issue encountered in the analysis of longitudinal data. In the behavioral intervention setting, missed visits and/or losing to follow up can be extremely problematic. In this area, missed visits are assumed to be a result of failure of the intervention, sustained lack of interest in the study, or decreased desire to change the behavior. For smoking cessation and weight loss studies, these are common issues that must be dealt with at the data analysis phase. For example, Perkins et al. (2001) conducted a weight concern on smoking study. The purpose of this study is to determine if cognitive-behavioral therapy can reduce weight concern and increase the success of smoking cessation. The study includes 219 women who were randomized to one of three groups: (i) behavioral weight control to prevent weight gain (weight control); (ii) cognitive-behavioral therapy to reduce concerns (CBT); or (iii) non specific social support (standard), which involved a discussion of weight. Participants were assessed for smoking abstinence, a binary measure, at 4-weeks postquit and 12-months postquit. However, the outcomes at the second time point for some women were missing due to drop-outs. The assumption in the smoking cessation literature is that these women were smoking so that all missing outcome values are set equal to zero (0 = smoking; 1 = not smoking) for the purposes of analyses. But it can introduce bias because not every woman who drops out is smoking.

In this article, we want to address the problem of informative missingness, in which the missing status depends on unknown outcome values. For this type of missingness, there are two main methods: selection models and pattern mixture models. In selection models, the joint distribution of outcomes and missingness is partitioned into the marginal distribution of outcome and the conditional distribution of missingness given outcomes. As an alternative to selection models, pattern mixture models work with the factorization of the joint distribution of outcomes and missingness into the marginal distribution of missingness and the conditional distribution of outcomes given missingness. Latent class models are another approach for informative missingness, and can be framed as a special case of a pattern mixture model, in which outcomes are divided into groups by the latent classes instead of by the missing patterns.

For latent variable models, variables are classified as ‘manifest’ when they can be directly observed and as ‘latent’ when they cannot be. A latent class model is a type of latent variable model with latent variables being categorical. Latent class models have been applied widely in the medical area for diagnoses (e.g., Garret and Zeger, 2000; Hadgu and Qu, 1998). Recently, latent class models are also used for dealing with missing data. Reboussin et al. (2002) proposed a latent class for multiple binary longitudinal outcomes subject to missing at random. The idea behind this method is to reweigh the binary outcomes by the inverse probability of being observed, which is an extension of Robins et al. (1995)’s weighted GEE approach. Roy (2003) proposed a latent dropout class model for continuous data with non ignorable dropouts. The ideas of latent dropout class models are based on the assumption that a small number of latent classes exist behind the sparse observed dropout times and that the probability of being in a given class is determined by the time of dropout.

Here, we propose a two-latent-class model for longitudinal binary response data with informative dropouts. In the proposed model, the observed data are divided into two latent classes: one of them is called ‘homogeneous’ class in which subjects have the same and deterministic outcomes (in the women’s smoking cessation study, we assume that the subjects in this class are always in smoking status) and a second one is called ‘heterogeneous’ class in which the subjects may have different outcomes and the outcomes can be modeled using the logistic regression. In the model, the latent variable is used as a mechanism to induce independence between the outcome and the missing status. Thus, in the proposed two-latent-class model, the drop-out process and response process are assumed to be independent given a latent class. Because these assumptions cannot be verified, we will assess the sensitivity by comparing the proposed model with other models such as the shared parameter model (Ten Have et al., 1998) and weighted GEE (Robins et al., 1995). Our ‘two-latent-class’ assumption cannot be used for all kinds of data, but this assumption is reasonable under some special cases, for example, in the smoking study, some subjects have much more vulnerability to smoking than the others and the outcomes are believed to be related to the genetic factors (Bergen and Caporaso, 1999).

The proposed two-latent-class model (TLCM) is based on the assumption that the binary responses are manifestation of latent classes. Here, we consider bivariate binary outcomes, which attempt to characterize the latent classes, and define **Y*** _{i}* =

Our method is motivated by the smoking cessation data, in which some subjects’ outcomes are not affected by interventions or other factors. We define *η _{i}* = (

Suppose for the subjects in the ‘heterogeneous’ class, the conditional outcome probability, *p _{ij}* =

$$\mathrm{logit}\left({p}_{ij}\right)={\beta}^{T}{\mathbf{x}}_{ij}$$

or

$${p}_{ij}=\frac{\mathrm{exp}\left({\beta}^{T}{\mathbf{x}}_{ij}\right)}{1+\mathrm{exp}\left({\beta}^{T}{\mathbf{x}}_{ij}\right)},$$

(1)

where **x**_{ij} is a vector of covariates (possibly time-dependent) and ** β** is a vector of regression coefficients. Note that

We let *p _{i}*

$$\begin{array}{cc}\hfill {q}_{i11}=& {\int}_{-\infty}^{{g}_{i1}}{\int}_{-\infty}^{{g}_{i2}}f({t}_{1},{t}_{2},\rho )d{t}_{2}d{t}_{1}\hfill \\ \hfill =& {p}_{i1}{p}_{i2}+n\left({g}_{i1}\right)n\left({g}_{i2}\right)\sum _{k=0}^{\infty}\frac{1}{(k+1)!}{H}_{k}\left({g}_{i1}\right){H}_{k}\left({g}_{i2}\right){\rho}^{k+1},\hfill \end{array}$$

(2)

where *f*(*t*_{1}, *t*_{2}, *ρ*) is the joint density function of the standardized bivariate normal distribution with correlation *ρ*, *n*(*u*) = (2π)^{−1/2} exp(−*u*^{2}/2) are the density function of the standard normal distribution, *π* is the correlation of *Y _{i}*

$${H}_{k}\left(\nu \right)=\sum _{i=0}^{[k\u22152]}\frac{k!}{i!(k-2i)!}{(-1)}^{i}{2}^{-i}{\nu}^{k-2i}$$

(3)

are the Hermite polynomials, where [*k*/2] is the largest integer in the range of ≤*k*/2. After obtaining *p _{i}*

For the subjects in the homogeneous class, *Pr*(*Y _{ij}* = 0|

$$\mathrm{logit}\left({\lambda}_{i}\right)={\alpha}_{0}+{\alpha}_{1}{r}_{i},$$

(4)

where *r _{i}* is the observed value of

$${e}_{0}=\mathit{Pr}({\eta}_{i}=1\mid {R}_{i}=0)=\frac{\mathrm{exp}\left({\alpha}_{0}\right)}{1+\mathrm{exp}\left({\alpha}_{0}\right)},$$

(5)

and

$${e}_{1}=\mathit{Pr}({\eta}_{i}=1\mid {R}_{i}=1)=\frac{\mathrm{exp}({\alpha}_{0}+{\alpha}_{1})}{1+\mathrm{exp}({\alpha}_{0}+{\alpha}_{1})}.$$

(6)

Because we cannot observe the values of latent class *η _{i}* from the data, we will estimate

Letting **y**_{i} denote the vector of observed longitudinal binary responses for the *i*th subject, the likelihood for our proposed model is

$$\begin{array}{cc}\hfill L(& \mathbf{y},r\mid \mathbf{x})\hfill \\ \hfill & =\prod _{i=1}^{n}L({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{r}_{i})L\left({r}_{i}\right)\hfill \\ \hfill & =\prod _{i=1}^{n}\left[L\right({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{r}_{i},{\eta}_{i}=1\left)\mathit{Pr}\right({\eta}_{i}=1\mid {r}_{i}\left)L\right({r}_{i})+L({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{r}_{i},{\eta}_{i}=0\left)\mathit{Pr}\right({\eta}_{i}=0\mid {r}_{i}\left)L\right({r}_{i}\left)\right]\hfill \\ \hfill & =\prod _{i=1}^{n}\left[L\right({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=1\left)\mathit{Pr}\right({\eta}_{i}=1\mid {r}_{i}\left)L\right({r}_{i})+L({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=0\left)\mathit{Pr}\right({\eta}_{i}=0\mid {r}_{i}\left)L\right({r}_{i}\left)\right].\hfill \end{array}$$

(7)

Here, we let *L*(**y**_{i} | **x**_{i}, *r _{i}*,

Based on the description above, the likelihood function including the parameters of interest can be written as

$$\begin{array}{cc}\hfill L(\beta ,\mathbf{e},\rho ;\mathbf{y},r\mid \mathbf{x})=& \prod _{i=1}^{n}\left[L\right(\beta ,\rho ,{\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=1\left)\mathit{Pr}\right({\eta}_{i}=1\mid {r}_{i})\hfill \\ \hfill & +L(\beta ,\rho ,{\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=0)\mathit{Pr}({\eta}_{i}=0\mid {r}_{i})\left]L\right({r}_{i})\hfill \\ \hfill =& \prod _{i=1}^{n}\left[\right(\mathit{Pr}({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=1){e}_{1}+I({Y}_{i1}=0,{Y}_{i2}=0)(1-{e}_{i})\left)I\right({R}_{i}=1)\hfill \\ \hfill & \left(\mathit{Pr}\right({\mathbf{y}}_{i1}\mid {\mathbf{x}}_{i1},{\eta}_{i}=1){e}_{0}+I({Y}_{i1}=0\left)\right(1-{e}_{0}\left)\right)I({R}_{i}=0)L\left({r}_{i}\right),\hfill \end{array}$$

(8)

where *I*(·) is the indicator function, and we assume that *Pr*(*y _{i}*

We let *y _{ist}* be the observed value of

$$L(\beta ,\mathbf{e},\rho ;\mathbf{y},r\mid \mathbf{x})=\prod _{i=1}^{n}\left[{q}_{i11}^{{y}_{i11}}{q}_{i10}^{{y}_{i10}}{q}_{i01}^{{y}_{i01}}{e}_{1}^{1-{y}_{i00}}{({q}_{i00}{e}_{1}+1-{e}_{1})}^{{y}_{i00}}I\right({R}_{i}=1)+{\left({p}_{i1}{e}_{0}\right)}^{{y}_{i1}}{(1-{p}_{i1}{e}_{0})}^{1-{y}_{i1}}I({R}_{i}=0\left)\right]L\left({r}_{i}\right).$$

(9)

Based on Eq. (9), the log-likelihood function is

$$\begin{array}{cc}\hfill l(\beta ,\mathbf{e},\rho ;\mathbf{y},r\mid \mathbf{x})=& \sum _{i=1}^{n}\left[\right(1-{y}_{i00}\left)\right({y}_{i11}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{q}_{i11}+{y}_{i10}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{q}_{i10}+{y}_{i01}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{q}_{i01}+\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{e}_{1})\hfill \\ \hfill & +{y}_{i00}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}({q}_{i00}{e}_{1}+1-{e}_{1})\left]I\right({R}_{i}-1)\hfill \\ \hfill & +\sum _{i=1}^{n}\left[{y}_{i1}\right(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{p}_{i1}+\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{e}_{0})+(1-{y}_{i1}\left)\mathrm{log}\right(1-{p}_{i1}{e}_{0}\left)\right]I({R}_{i}=0).\hfill \end{array}$$

(10)

We use the quasi-Newton method to obtain the estimates of ** β**,

The proposed latent class model can also be applied to intermittent missing data, and the likelihood function becomes

$$\begin{array}{cc}\hfill L(\beta ,\mathbf{e},\rho ;\mathbf{y},r\mid \mathbf{x})=& \prod _{i=1}^{n}\left[\right(\mathit{Pr}({\mathbf{y}}_{i}\mid {\mathbf{x}}_{i},{\eta}_{i}=1){e}_{11}+I({Y}_{i1}=0,{Y}_{i2}=0)(1-{e}_{11})\left)I\right({\mathbf{R}}_{i}={(1,1)}^{\prime})\hfill \\ \hfill & +\left(\mathit{Pr}\right({y}_{i1}\mid {\mathbf{x}}_{i1},{\eta}_{i}=1){e}_{10}+I({Y}_{i1}=0\left)\right(1-{e}_{10}\left)\right)I({\mathbf{R}}_{i}={(1,0)}^{\prime})\hfill \\ \hfill & +\left(\mathit{Pr}\right({y}_{i2}\mid {\mathbf{x}}_{i2},{\eta}_{i}=1){e}_{01}+I({Y}_{i2}=0\left)\right(1-{e}_{01}\left)\right)I({\mathbf{R}}_{i}={(0,1)}^{\prime})\left]L\right({\mathbf{r}}_{i}).\hfill \end{array}$$

(11)

Here, **R**_{i} is a 2 × 1 vector, (*R _{i}*

Ten Have et al. (1998) developed a shared parameter model (SPM) with a logistic link for longitudinal binary response data to accommodate informative dropouts. The model includes two components: observed longitudinal components and dropout components. These two parts share random effects parameters and they are independent after conditioning on the random effects structure. The shared random effects here are assumed to be continuous and they count for the correlation within subjects and the correlation between the outcomes and the missing status. It is one of the differences between the shared parameter model and the proposed two-latent-class model. In our proposed method, the correlation within subjects is considered separately.

Robins et al. (1995) proposed weighted GEE model (WGEE) in which the parameter estimates are consistent when the responses are missing at random (MAR). In weighted GEE model, weights equal to the inverse probability of ‘being missing’ at time of attrition are added to the standard GEE model.

We perform a simulation study comparing the proposed method with the shared parameter model and the weighted GEE model. We generate data by considering two aspects: the logistic model structure for outcomes and the missing structure.

For the outcomes, we consider the case of a binary response measured at two time points, **Y**_{i} = (*Y _{i}*

For the missing structure, we assume a monotone missing data pattern with the binary response at the first time point completely observed. Three missing procedures are considered for the response at time point 2: (i) MAR missing procedure, in which

$$\mathrm{logit}\left[\mathit{Pr}\right({R}_{i}=0\mid {x}_{1i}\left)\right]=-0.5-0.5{x}_{1i}.$$

Because the missingness depends on the observed data, this missing procedure is missing at random (MAR). (ii) SPM missing procedure. For this missing procedure, note that the *p _{ij}* =

$$\mathrm{logit}\left({p}_{ij}\right)=\beta {x}_{i1}+\beta {x}_{2ij}+\sigma \tau ,$$

where τ is a normal variable with mean = 0 and variance σ^{2} = 1 or σ^{2} = 5^{2} that is, the correlation between *Y _{i}*

$$\mathrm{logit}\left[\mathit{Pr}\right({R}_{i}=0\mid \tau \left)\right]=-0.5+\sigma \tau .$$

According to Ten Have et al. (1998), when σ = 1, the dependency of dropouts on the random effect variable is moderate; when σ = 5, the dependency is strong. We want to see if the proposed two-latent-class model can handle different levels of dependency. (iii) TLCM missing procedure, in which we let *Pr*(*R _{i}* = 0) = 0.25. In the data generation procedure, for MAR and SPM missing procedures, we first obtain full data set, then according to the missing structure to delete some

For the entire simulation study, we consider the three types of missing procedures and sample sizes of 200 with 1,000 replications. Because in our proposed model, outcomes are divided into two latent classes and in one of them, outcomes cannot be modeled, so we cannot obtain marginal estimates of parameters. The comparisons between our proposed model and the other models will be performed using the area under the Receiver Operating Characteristic (ROC) curve. We use trapezoidal rule to calculate the area under the curve (AUC), which is a nonparametric method based on constructing trapeziods under the curve as an approximation of area. The summary measures for each model are mean AUC and standard error of mean AUC over 1,000 replications, which are based on estimated outcomes *Y _{i}*

Simulation results: mean AUC and standard error for the three models under different missing procedures

In the comparisons among the three methods, the overall conclusions are that the two-latent-class model performs well under all missing procedures; the shared parameter model performs well in most cases except under TLCM missing procedure with small *e*_{1} and *e*_{0}; the weighted GEE model performs well when the missing procedure is MAR or missingness is ‘weak’ informative.

In this section, we illustrate the proposed two-latent-class model, the shared parameter model and the weighted GEE model using an example from the women’s smoking cessation study (Perkins et al., 2001). This is a longitudinal study designed to assess the effect of weight concern on smoking cessation for women. At enrollment, 219 women met the eligibility criteria. All of the participants were randomly divided into three groups: (i) behavioral weight control to prevent weight gain (weight control); (ii) cognitive-behavioral therapy to reduce weight concerns (CBT); or (iii) non specific social support (standard), which was a control group and involved no discussion of weight. Each of the three interventions consisted of ten 90-minute sessions provided over 7 weeks, with two sessions per week during the first 3 weeks and 1 session per week over the next 4 weeks. Participants were instructed to quit smoking after the fourth session. Follow-up sessions were scheduled at 3, 6, and 12 months postquit for assessment purposes; no treatment was provided in these periods. In this trial, the repeated binary responses of interest are whether the participants are in continuous abstinence or not (1 = yes, 0 = no). Here, continuous abstinence was defined as no relapse since the quit day and relapse were defined as self-report of 7 consecutive days of any smoking at all or an expired-air carbon monoxide (CO) greater than 8 ppm, as widely recommended (Ossip-Klein et al., 1986).

In this study, we focus on the outcomes at two time points, 4-week postquit (*Y*_{1}) and 12-month postquit (*Y*_{2}). The 57 women who had missing data at 4-week postquit (also missing at 12-month postquit) were removed from all analyses, leaving 162 subjects (116 subjects have no missing data; 46 subjects were observed at 4-week postquit and missing at 12-month postquit). To identify significant covariates related to outcomes, we carried out a preliminary analysis by using standard GEE (Liang and Zeger, 1986). The results showed that the following variables should be included in the models: group (*weight control, CBT*) (‘standard’ as a control group), time (t) and age at first cigarette (*age*). We also fit a generalized additive model (GAM) for the outcomes, *Y*_{1}, with the covariate, ‘age at first cigarette’, that is, *logit*(*E*(*Y*_{1})) = *s*_{0} + *s*_{1} (*age*), where *s _{i}*(·),

Plots of smooth functions in a generalized additive model for *Y*_{1} with ‘age at first cigarette (age)’ as covariate for the smoking cessation study. (The dashed lines indicate plus and minus two pointwise standard deviations; the number **...**

A summary of outcomes is in Table 2. It shows that missingness at 12-month follow-up is 28.40%. The abstinent rate for subjects without missingness at 12-month follow-up is 53/(53 + 63) = 45.68%, which is much less than the abstinent rate (72.84%) at the 4-week postquit. It also shows that the dropout might be related to the unobserved second-time outcomes. Based on these results it is reasonable to consider nonignorable missingness.

In Table 3, we present the parameter estimates, standard errors and *Z*-values calculated from them for these three models. These parameter estimates are common fixed effects under these models. From the results, we can see that ‘time’ is significant in the models with the decreasing abstinent rate over time. All of the analyses show that the ‘CBT’ group has a larger abstinent rate compared with the ‘standard’ group. The results obtained from the weighted GEE model also show that ‘age at first cigarette’ is significant in the square term, while this factor is not significant in the other models. In the two-latent-class model, we obtain the correlation within the subject, *ρ* = 0.347 (SE = 0.189, *Z* = 1.84). Table 4 gives the estimates for *e*_{1} and *e*_{0}. It indicates that subjects with missing outcomes have a greater probability (1 − 0.780 = 0.220) of being in the special status, that is, they are stubborn smokers, than the subjects without missing outcomes (1 − 0.817 = 0.183). According to the estimates and the standard errors in Table 4, we can compare the difference between *e*_{1} and *e*_{0} and *t*-test statistics is 4.35 with 160 degrees of freedom (*p*-value < 0 001), so the subjects dropping out the study have significantly more probability of being in the ‘homogeneous’ latent class, that is, being smoking. From the ${\widehat{e}}_{1}$ and ${\widehat{e}}_{0}$, we obtain ${\widehat{\alpha}}_{1}=0.010$ and ${\widehat{\alpha}}_{0}=0.550$.

Marginal parameter estimates, estimated standard errors, and Z-values for the smoking cessation study (Modeling Pr(abstinent))

Estimates, estimated standard errors, and Z-values for the latent classes, *e*_{1} and *e*_{0}, under the proposed method for the smoking cessation study (Z-value is for testing *H*_{0} : *e*_{1} = 1 or *H*_{0} : *e*_{0} = 1)

We compare the performance of the proposed two-latent-class model, the shared parameter model and the weighted GEE model in terms of the empirical ROC curve and its area under the curve (AUC). The calculations are based on estimated outcomes at three months by each model and the observed data as a gold standard. We summarize the estimated ROC curves for these three methods in Fig. 2, and the corresponding areas under the ROC curves with standard errors for the two-latent-class model, the shared parameter model and the weighted GEE model are 0.865 ± 0.030, 0.623 ± 0.046, and 0.629 ± 0.046, respectively. The computation of AUC and the standard error of AUC is carried out using trapezoidal rule and the variance of the Wilcoxon statistic (Hanley and McNeil, 1982). According to the method given by Hanley and McNeil (1983), the critical ratio between the areas estimated by the two-latent-class model and the shared parameter model is 4.92 (*p*-value < 0 001). The critical ratio between the areas estimated by the two-latent-class model and the weighted GEE model is 4.77 (p-value < 0.001). It appears that the proposed two-latent-class model gives significantly better fit to the data than the other methods.

Pregibon (1981) extended regression diagnostics for logistic regression. Based on his article, the component of the χ^{2} goodness-of-fit statistic for subject *i* at time point *j*, is

$${\chi}_{ij}=\frac{{Y}_{ij}-{\widehat{\alpha}}_{ij}}{\sqrt{{\widehat{\alpha}}_{ij}(1-{\widehat{\alpha}}_{ij})}},$$

where *α _{ij}* =

$$\begin{array}{cc}\hfill {\widehat{\alpha}}_{ij}& =\mathit{Pr}({Y}_{ij}=1\mid {\eta}_{i}=1)\mathit{Pr}({\eta}_{i}=1)+\mathit{Pr}({Y}_{ij}=1\mid {\eta}_{i}=0)\mathit{Pr}({\eta}_{i}=0)\hfill \\ \hfill & ={\widehat{p}}_{ij}\mathit{Pr}({\eta}_{i}=1),\hfill \end{array}$$

where ${\widehat{p}}_{ij}$ is estimated under Eq. (1), and

$$\begin{array}{cc}\hfill \mathit{Pr}({\eta}_{i}=1)& =\mathit{Pr}({\eta}_{i}=1\mid {R}_{i}=1)\mathit{Pr}({R}_{i}=1)+\mathit{Pr}({\eta}_{i}=1\mid {R}_{i}=0)\mathit{Pr}({R}_{i}=0)\hfill \\ \hfill & ={\widehat{e}}_{0}\mathit{Pr}({R}_{i}=1)+{\widehat{e}}_{1}\mathit{Pr}({R}_{i}=0).\hfill \end{array}$$

Figure 3 are index plots of the χ^{2} components for the two-latent-class model at each time point. Asymptotic arguments concerning the distribution of the χ^{2} goodness-of-fit statistic suggest that χ_{ij} should be of order of magnitude *N*(0, 1). Observations with values outside ±2 are not well accounted for by the fitted model (Kay and Little, 1986). From the figure, we can see that there are very few poorly fit subjects in the two-latent-calss model.

Table 5 lists observations *i* = 114, 127, 135, 143, and 151, which are poorly fit by the two-latent-class model. The common characteristics of these subjects are that they were 11–13 years old at first cigarette, they belonged to nonspecific social support treatment group, and they quit smoking at 12-month postquit. Based on the results in Table 3, the estimated probabilities of being quitting smoking for these subjects are very small. It explains that why these five subjects are outliers.

In this article, we develop a two-latent-class model for categorical responses with non ignorable dropouts, in which subjects are divided into two latent classes and one of them is deterministic. The dropout time is related to the latent class, whose probability is estimated by the maximum likelihood estimation. For the relationship within subjects, we use the tetrachoric correlation (le Cessie and van Houwelingen, 1994) for the estimation.

The simulation results and the application to the smoking cessation data provide support for the use of the proposed two-latent-class model when compared to the shared parameter model and weighted GEE model. The results of the simulation indicate that the proposed model generally performs well in various missing procedures. Although the shared parameter model is also good in most cases, when data is heterogeneous with outcomes in different strata, the two-latent-class model is a better choice. In the results of the application to the smoking cessation data, the parameter estimates of the proposed model for the subjects in the heterogeneous class are similar to the shared parameter model for all the subjects. But the proposed model consider the data set divided into different latent classes, which results in better AUC, that is, the proposed model fits the data in a better way. But because we only consider the two time point paradigm in the simulation and application, the above conclusion is fairly limited.

In the two-latent-class model, the latent variable is used to induce conditional independence between the outcome and missing status so that standard likelihood techniques can be used to derive the estimators. While the two-latent-class model can be considered as a type of pattern mixture model, the difference between them is that in the two-latent-class model, ‘pattern’ is not based on missing patterns but on the strata behind the data set, for example, in the smoking cessation data, we assume that there exist two classes based on the smoking stubborn. Here, we focused on monotone missingness, but the method can also be used for intermittent missing data in the same way.

One of the advantages of our proposed method is that its calculation is easy because there are only two latent classes and in one of them the outcomes are deterministic. The other advantage is according to the simulation results, the two-latent-class model can be used under various missing procedures. Our method can be easily extended for the situation of multivariate outcomes, that is, the total time points, *J*, is larger than 2. We know that as *J* increases, the calculation of the tetrachoric correlation becomes complicated. One of the solutions to this problemis that we can use the product of all pairwise likelihood with a subject instead of the true contribution of a subject to the likelihood (le Cessie and van Houwelingen, 1994). For example, for ith subject, its contribution to the log-likelihood is

$${l}_{i}=\frac{1}{J-1}\sum _{j=1}^{J}\sum _{{j}^{\prime}=1}^{j}{l}_{j{j}^{\prime}},$$

(12)

where *l _{jj’}* is the pairwise likelihood of (

This research was supported by National Institute on Drug Abuse R01 DA04174.

Differentiation of the log-likelihood function (10) with respect to ** β**,

$$\begin{array}{cc}\hfill \frac{\partial l}{\partial \beta}=& \sum _{i=1}^{n}\left[\frac{{y}_{i11}}{{q}_{i11}}\frac{\partial {q}_{i11}}{\partial \beta}+\frac{{y}_{i10}}{{q}_{i10}}\frac{\partial {q}_{i01}}{\partial \beta}+\left(\frac{{y}_{i00}}{{q}_{i00}+1-{e}_{1}}\frac{\partial {q}_{i00}}{\partial \beta}\right)\right]I({R}_{i}=1)\hfill \\ \hfill & +\sum _{i=1}^{n}\left(\frac{{y}_{i1}}{{p}_{i1}}\frac{\partial {p}_{i1}}{\partial \beta}-\frac{1-{y}_{i1}}{1-{p}_{i1}{e}_{0}}\frac{\partial {p}_{i1}}{\partial \beta}\right)I({R}_{i}=0),\hfill \\ \hfill \frac{\partial l}{\partial \rho}=& \sum _{i=1}^{n}\left(\frac{{y}_{i11}}{{q}_{i11}}-\frac{{y}_{i10}}{{q}_{i10}}-\frac{{y}_{i01}}{{q}_{i01}}+\frac{{y}_{i00}{e}_{1}}{{q}_{i00}{e}_{1}+1-{e}_{1}}\right)\frac{\partial {q}_{i11}}{\partial \rho}I({R}_{i}=1),\hfill \\ \hfill \frac{\partial l}{\partial {e}_{1}}=& \sum _{i=1}^{n}\left(\frac{1-{y}_{i00}}{{e}_{1}}-\frac{{y}_{i00}({q}_{i00}-1)}{{q}_{i00}{e}_{1}+1-{e}_{1}}\right)I({R}_{i}=1),\hfill \\ \hfill \frac{\partial l}{\partial {e}_{0}}=& \sum _{i=1}^{n}\left(\frac{{y}_{i1}}{{e}_{0}}-\frac{(1-{y}_{i1}){p}_{i1}}{1-{p}_{i1}{e}_{0}}\right)I({R}_{i}=0),\hfill \end{array}$$

Here, we have used that *q*_{i10}/*ρ* = (*p*_{i1} − *q*_{i11})/*ρ* = −*q*_{i11}/*ρ*, etc. To estimate $\frac{\partial {q}_{i11}}{\partial \beta}$ and $\frac{\partial {q}_{i11}}{\partial \rho}$, from Eq.(2) ${q}_{i11}={\int}_{-\infty}^{{g}_{i1}}{\int}_{-\infty}^{{g}_{i2}}f({t}_{1},{t}_{2},\rho )d{t}_{2}d{t}_{1}$, we have

$$\frac{\partial {q}_{i11}}{\partial \beta}=\Phi \left\{\frac{{g}_{i2}-\rho {g}_{i1}}{\sqrt{(1-{\rho}^{2})}}\right\}\frac{\partial {p}_{i1}}{\partial \beta}+\Phi \left\{\frac{{g}_{i1}-\rho {g}_{i2}}{\sqrt{(1-{\rho}^{2})}}\right\}\frac{\partial {p}_{i2}}{\partial \beta}$$

and

$$\frac{\partial {q}_{i11}}{\partial \rho}=f({g}_{i1},{g}_{i2},\rho ),$$

where Φ is the standard normal cumulative distribution function.

The elements of the observed information matrix are obtained by taking the second-order derivatives on the log-likelihood function (10),

$$\begin{array}{cc}\hfill \frac{{\partial}^{2}l}{\partial {\beta}^{2}}=& -\sum _{i=1}^{n}\{\frac{{y}_{i11}}{{q}_{i11}^{2}}\left(\frac{\partial {q}_{i11}}{\partial \beta}\right){\left(\frac{\partial {p}_{i11}}{\partial \beta}\right)}^{\prime}+\frac{{y}_{i10}}{{q}_{i10}^{2}}\left(\frac{\partial {q}_{i10}}{\partial \beta}\right){\left(\frac{\partial {p}_{i10}}{\partial \beta}\right)}^{\prime}+\frac{{y}_{i01}}{{q}_{i01}^{2}}\left(\frac{\partial {q}_{i01}}{\partial \beta}\right){\left(\frac{\partial {p}_{i01}}{\partial \beta}\right)}^{\prime}\phantom{\}}\hfill \\ \hfill & \phantom{\{}+\frac{{y}_{i00}{e}_{1}}{{({q}_{i00}+{e}_{1}+1-{e}_{1})}^{2}}\frac{{y}_{i00}}{{q}_{i00}^{2}}\left(\frac{\partial {q}_{i00}}{\partial \beta}\right){\left(\frac{\partial {p}_{i00}}{\partial \beta}\right)}^{\prime}\}\hfill \\ \hfill & -\sum _{i=1}^{n}\left\{\frac{{y}_{i1}}{{p}_{i1}^{2}}\left(\frac{\partial {p}_{i1}}{\partial \beta}\right){\left(\frac{\partial {p}_{i1}}{\partial \beta}\right)}^{\prime}+\frac{(1-{y}_{i1}){e}_{0}}{{(1-{p}_{i1}{e}_{0})}^{2}}\left(\frac{\partial {p}_{i1}}{\partial \beta}\right){\left(\frac{\partial {p}_{i1}}{\partial \beta}\right)}^{\prime}\right\}I({R}_{i}=0),\hfill \end{array}$$

**Mathematics Subject Classification** Primary 62H30; Secondary 62J12.

- Ashford JR, Sowden RR. Multivariate probit analysis. Biometrics. 1970;26:535–546. [PubMed]
- Bergen AW, Caporaso N. Cigarette smoking. J. Nat. Cancer Instit. 1999;16:1365–1375. [PubMed]
- Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Comput. 1995;16:1190–1208.
- Emrich LJ, Piedmonte MR. A method for generating high-dimensional multivariate binary variates. Amer. Statistician. 1991;45:302–304.
- Garret ES, Zeger SL. Latent class model diagnosis. Biometrics. 2000;56:1055–1067. [PubMed]
- Hadgu A, Qu Y. A biomedical application of latent class models with random effects. Appl. Statis. 1998;47:603–616.
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. [PubMed]
- Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–843. [PubMed]
- Kay R, Little S. Assessing the fit of the logistic model: a case study of children with the haemolytic uraemic syndrome. Appl. Statist. 1986;35:16–30.
- le Cessie S, van Houwelingen JC. Logistic regression for correlated binary data. Appl. Statist. 1994;43:95–108.
- Liang K, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
- Lin H, McCulloch CE, Turnbull BW, Slate EH, Clark LC. A latent class mixed model for analyzing biomarker trajectories with irregularly scheduled observations. Statist. Med. 2000;16:1303–1318. [PubMed]
- Ossip-Klein DJ, Bigelow G, Parker SR, Curry S, Hall S, Kirkland S. Classification and assessment of smoking behavior. Health Psychol. 1986;5(Suppl.):3–11. [PubMed]
- Perkins KA, Marcus MD, Levine MD, D’Amico D, Miller A, Broge M, Ashcom J. Cognitive-behavioral therapy to reduce weight concerns improves smoking cessation outcome in weight-concerned women. J. Consult. Clin. Psycho. 2001;69:604–613. [PubMed]
- Pregibon Logistic regression diagnostics. Ann. Statist. 1981;9:705–724.
- Reboussin BA, Miller ME, Lohman KK, Ten Have TR. Latent class models for longitudinal studies of the elderly with data missing at random. Appl. Statist. 2002;51:69–90.
- Robins JM, Rotnetzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statis. Assoc. 1995;90:106–121.
- Roy J. Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics. 2003;59:829–836. [PubMed]
- Ten Have TR, Kunselman AR, Pulkstenis EP, Landis JR. Mixed effects logistic regression models for longitudinal binary response data with informative dropout. Biometrics. 1998;54:367–383. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |