Before we introduce the model, we first go through some additional notation needed for the latent class component. Define

*S*_{i} = (

*S*_{i}_{1},...,

*S*_{iM} )

^{T} to be a vector of latent indicators, where

*S*_{ij} is defined as an indicator for class

*j*,

*j* = 1,...,

*M* (

*M*<

*T*; e.g., if subject

*i* is in class

*j*, then

*S*_{ij} = 1 and

*S*_{ij′} = 0 for all

*j* ≠

*j′*). The idea here will be to “group” the dropout times into the

*M* classes as in

Roy (2003).

All of the parameters in the following specification are a function of the number of latent classes,

*M*; for example,

*β*^{(M)}. However, we suppress the superscripts without loss of clarity in the following. First, we specify the marginal mean as

By marginal, we mean marginalized over subject-specific random effects

*and* over the latent class distribution (implicitly over the dropout distribution as well). If the number of classes

*M* were known, then the parameters

*β* would be of primary interest. We address the issue of

*M* being unknown below.

In order to fully account for correlation due to repeated observations and informative censoring, we specify a conditional model in addition to the marginal model. Recall that we are taking a pattern mixture modeling approach to account for dropout. We assume that the relevant information in

*D* is captured by the latent variable

*S*. We therefore specify a mixture distribution over these latent classes, as opposed to over

*D* itself. Before proceeding to describe the model, however, we first make two points. First, the parameters from the conditional model are not of scientific interest, and in fact are viewed as nuisance parameters; we are not interested in estimating subject-specific effects (i.e., effects conditional on the random effects) or class-specific covariate effects (i.e., effects of covariates on

*Y* given a particular dropout class). Second, we must specify the conditional model in a way that is compatible with the marginal model

(3). As we will see below, this leads to a somewhat complicated model. Specifying this conditional model is necessary, however, in order to account for the two types of dependencies (within-subject correlation and dependency between the outcome and dropout time).

We assume the data

*Y*_{it}, conditional on random effects

*b*_{i} and latent class

*S*_{i}, are from an exponential family with distribution

where E(

*Y*_{it}|

*b*_{i},

*S*_{i}) =

*g*^{–1} (

*η*_{it}) =

*ψ*′ (

*η*_{it}),

*η*_{it} is the linear predictor,

*ψ*(·) is a known function,

is a scale parameter, and

*m*_{i} is the prior weight. This family includes normal (

*ψ*(

*x*) =

*x*^{2}/2), binomial (

*ψ*(

*x*) = log (1 +

*e*^{x})), and Poisson (

*ψ*(

*x*) =

*e*^{x}) distributions, among others. The conditional mean is specified as

where, in the most general form of the model we allow the variance of

*b*_{i} to depend on the latent class, that is, [

*b*_{i}|

*S*_{ij} = 1] ~

*N*(0,

*θ*_{j}). For identifiability, we use a sum-to-zero constraint on the

*α*'s, namely,

. In this conditional model, each subject has its own intercept, and the effect of each covariate,

*Z*_{itj} (

*Z*_{it} X_{it}), is allowed to differ by dropout class via the regression coefficients,

*α*^{(j)}.

The probabilities of the latent classes given the dropout time are specified as a proportional odds model,

where

*λ*_{01} ≤

*λ*_{02} ≤ ··· ≤ λ

_{0,M–1} and

*λ*_{1} are unknown parameters. From this regression

(5) it is clear that the class probabilities are a monotone function of dropout time (in fact, linear on the logit scale). Finally, the dropout times,

*D*_{i}, follow a multinomial distribution with mass at each of the possible dropout times, parameterized by

*γ*.

We point out that in the above formulation, *Y*_{it} is independent of *D*_{i} given *S*_{i}. This is a key assumption with this approach, which we will examine in Section 3.4.

The intercept Δ

_{it} in

(4) is determined by the relationship between

(3) and

(4), namely, the solution to

The main target of inference typically will be the covariate effects averaged over classes, that is,

*β*^{(M)} averaged over

*M*. We denote this as

*β** = ∑

_{m }*β*^{(m) }*p*(

*m*|

*y*). We discuss computation of

*p*(

*m*|

*y*) in Section 3.3 and the corresponding computation of

.