Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3032542

Formats

Article sections

- SUMMARY
- 1. INTRODUCTION
- 2. MODELING INCOMPLETE LONGITUDINAL DATA
- 3. IMPLEMENTATION VIA MPI
- 4. APPLICATION
- 5. DISCUSSION
- References

Authors

Related links

Stat Med. Author manuscript; available in PMC 2011 February 2.

Published in final edited form as:

PMCID: PMC3032542

NIHMSID: NIHMS261274

Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.

In biomedical research with clinical trials, the effectiveness of a treatment or intervention method is often investigated by adopting a longitudinal design where each subject is repeatedly measured on the variables of interest throughout a period of time. In such studies, there are often missing values reflecting the problematic nature of the phenomenon under study, such as substance abuse [1] and mental health disorder [2]. The proportion of missingness is sometimes notably large, e.g. 70 per cent at termination in a randomized trial of buprenorphine *versus* methadone in treating addiction to cocaine use [3]. Although investigators may devote substantial effort in minimizing the number of missing values, some amount of missingness is often inevitable in practice.

A convenient way for incomplete longitudinal data analysis is to ignore those missing values and fit a model based on all available data. Currently, three groups of longitudinal models are popularly used in this way: marginal models with generalized estimating equations (GEE) for studying the group-averaged characteristics [4], mixed models with random effects used to describe heterogeneity among individuals [5], and transition models, which model the sequence of responses from a person dynamically by conditioning on previous observations and baseline features [6]. Although different assumptions are required by each modeling option, they essentially require that missing values be at least ‘*ignorable*,’ i.e. the indicator variable for whether a measure is missing is independent of the value of that measure given other observed measures and covariates [7, 8]. Here, the independence between the missingness indicators and missing values also requires that the parameters in modeling the repeated measures and those modeling the mechanism of missingness are distinct. Without specifying likelihood functions, the method of GEE offers a flexible modeling strategy. Although the standard version of GEE requires a stronger assumption, various adjusting methods have been proposed to handle ignorable missing values [9].

Unfortunately, there are some studies where empirical evidence suggests that *ignorability* is implausible [10–12]. In this case, when standard modeling options are used, invalid statistics or biased estimators may be obtained. Hence, advanced modeling strategy assuming *nonignorable missingness* becomes desirable. In previous years, several lines of modeling development for nonignorable missingness have been initiated based on modeling the joint distribution of the indicators of missingness and the values of repeated measures (including observed and missing values). The likelihood function derived from the joint distribution is called the full-likelihood function [13]. As summarized by Li and coworkers [14], at least three factorizations of the joint distribution could be considered: (1) outcome-dependent factorization, where missingness indicators are assumed to be conditioned on the values of repeated measures; (2) pattern-dependent factorization, where the distribution of repeated measure values is a mixture of distributions for subjects within sub-groups determined by the patterns of missingness; and (3) parameter-dependent factorization, where repeated measure values and missingness indicators are conditionally independent of each other given a group of shared parameters. Correspondingly, three models could be conceived according to the way of factorization and are termed, respectively, as selection models, pattern-mixture models, and shared-parameter models.

Compared with intermittent missingness (occasional omission), dropout (premature withdrawal) usually leads to a larger proportion of missingness and the mechanism for dropout is often associated with both missing and observed values (as well as baseline covariates such as treatment assignment). Hence, dropout is more problematic. To deal with nonignorable dropouts (i.e. missing values after withdrawal), Diggle and Kenward [10] proposed the method of selection model while comparing diets of cows for increasing their milk protein content. Using the data set from a multi-center trial with a parallel group design to study the efficacy of Vorozole for treating breast cancer, Molenburghs and colleagues [15] developed strategies for fitting pattern-mixture models for dropouts. For repeatedly measured count data subject to dropout, Albert and Follmann [16] introduced the prototype of a shared-parameter model. For a detailed review on modeling dropout mechanisms, refer to Little [17]. The presence of intermittent missing values in addition to dropouts further complicates the modeling procedures. Unlike dropouts, the patterns of intermittent missingness can be of any nonmonotonic form. Therefore, intermittent missingness is technically difficult to handle, although conceptually easy. According to Troxel and coauthors [18], the selection model was extended to include the case of intermittent missing values. A breakthrough contribution was given by Albert and Follmann [19], who proposed the shared-random-effects Markov transition model (REMTM) to deal with nonignorable missingness in longitudinal binary data. This model was further generalized by Li *et al.* [14] to accommodate Poisson-distributed count measures.

Since the subjects in question still remain in the study, we may be able to assume that intermittent missingness is ignorable [6] when the proportion of missingness is moderate. Adopting this point of view, Yang and Shoptaw [20] developed the idea of *multiple partial imputation* (MPI) to assess dropout mechanisms when there are intermittent missing values. Within the framework of MPI, intermittent missing values are first assumed to be ignorable and imputed using an outcome-dependent modeling technique; then the partially imputed data sets are analyzed to investigate the dropout mechanism; and finally, the multiple versions of assessment are combined to make one final set of inferential statements.

The approach of MPI provides a much more general framework, not only for data exploration but also for analysis purposes. This article proposes strategies for implementing incomplete longitudinal models to analyze continuous repeated measures with intermittent missing values and dropouts. The article is organized as follows. We begin by introducing a motivating practical data set from a smoking cessation clinical trial where a moderate amount of missing values is seen. In Section 2, we discuss modeling strategies based on full-likelihood functions for incomplete longitudinal data with special attention to handle nonignorable dropouts. In Section 3, we propose imputation-based strategies and Markov chain Monte Carlo (MCMC) algorithms for implementing various incomplete longitudinal models. The smoking cessation data set is then analyzed using the above strategies, and finally we give some practical guidelines for using the proposed imputation-based strategies.

The development of this work was closely related to the analysis of a clinical trial of smoking cessation in methadone-maintained tobacco smokers [21]. This study tested the effectiveness of a relapse prevention (RP) program and a contingency management (CM) program, alone and in combination, for improving smoking cessation outcomes using nicotine transdermal pharmacotherapy. A total of 174 participants were randomly assigned to one of the four behavioral treatment groups: a control group that received no behavioral therapy (42 subjects); RP-only (42 subjects); CM-only (43 subjects); and a combined RP+CM condition (47 subjects). Thirty-six measures of carbon monoxide levels in expired breath were scheduled to be taken on each participant over the 12-week study period, three times per week.

Figure 1 depicts the mean values of observed carbon monoxide levels for the four treatment groups, after a log(1 + *y*) transformation. Also depicted are standard deviations and point-wise ANOVA results with *p*-values smaller than 0.01 after ignoring missing values. A problem with this exploratory analysis is that the 36 *p*-values cannot be easily combined in making inferences regarding overall differences. Additionally, the comparison between treatment conditions when missing values are ignored may lead to biased conclusions when missingness is not *completely at random*. For example, if smokers in the three treatment groups dropped out with higher probabilities given a higher level of previously observed carbon monoxide while smokers in the control group dropped out completely at random, then mean levels of carbon monoxide in the treatment groups would turn out to be lower than those in the control group at visit times close to the termination of the study, even though there are no treatment effects at all.

The average and SD curves for the log-scaled carbon monoxide levels. On this plot, the four mean curves of the log-scaled carbon monoxide levels and the corresponding pointwise standard errors are drawn for each of the four treatment conditions: Control, **...**

In Figure 2, the patterns of missingness are plotted for each treatment group, after a sorting process on the dropout times. From the graphs, it is seen that missingness due to dropout corresponds to monotonic forms. At the termination of the study, up to 36 per cent of the participants had withdrawn. An overall percentage of 4.3 per cent of intermittent missing values is seen. The patterns and rates of missing values in this study are typical in substance abuse research. Assuming that intermittent missing values and dropouts were ignorable, random-effects models were applied to the whole incomplete data set and significantly favorable effect of CM was reported in Shoptaw *et al*. [21]. In Section 4, we will reanalyze this set of carbon monoxide levels to illustrate various modeling strategies introduced in the following two sections.

For a longitudinal data set with balanced design, *J* repeated measures are potentially observed on each of the *N* independent subjects at times *t _{i}*

When values of some measures are missing, we partition **y*** _{i}* into two parts,
${\mathbf{y}}_{i}={({\mathbf{y}}_{i}^{\text{obs}},{\mathbf{y}}_{i}^{\text{mis}})}^{\text{T}}$, where
${\mathbf{y}}_{i}^{\text{obs}}$ indicates the observed values and
${\mathbf{y}}_{i}^{\text{mis}}$ indicates values that would be observed if they were not missing. For convenience, we also introduce a vector of missingness indicators,

$$L(\mathbf{\theta},\mathbf{\varphi}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{X}}_{i},{\mathbf{r}}_{i})\underset{\int f({\mathbf{y}}_{i},{\mathbf{r}}_{i}{\mathbf{X}}_{i},\mathbf{\theta},\mathbf{\varphi})\phantom{\rule{0.16667em}{0ex}}\text{d}{\mathbf{y}}_{i}^{\text{mis}}}{\overset{}{i=1N}}$$

where vectors **θ** and **ϕ**, respectively, represent the parameters of the measurement model and those of the missingness mechanism. Determined by possible causal pathways, there exist at least three approaches to decompose the joint distribution of the complete data and missingness indicators: outcome-dependent factorization, pattern-dependent factorization, and parameter-dependent factorization. Accordingly, we have the following models for incomplete longitudinal data.

*Selection model* factors the joint distribution *f* (**y*** _{i}*,

$$f({\mathbf{y}}_{i},{\mathbf{r}}_{i}{\mathbf{X}}_{i},\mathbf{\theta},\mathbf{\varphi})\phantom{\rule{0.16667em}{0ex}}=f({\mathbf{y}}_{i}{\mathbf{X}}_{i},\mathbf{\theta})f({\mathbf{r}}_{i}{\mathbf{y}}_{i},{\mathbf{X}}_{i},\mathbf{\varphi})$$

where *f* (**r*** _{i}*|

*Pattern-mixture model* is a pattern-dependent model, assuming that distribution of repeated measures varies with the missingness patterns and the joint distribution is factored as

$$f({\mathbf{y}}_{i},{\mathbf{r}}_{i}{\mathbf{X}}_{i},\mathbf{\theta},\mathbf{\varphi})\phantom{\rule{0.16667em}{0ex}}=f({\mathbf{y}}_{i}{\mathbf{r}}_{i},{\mathbf{X}}_{i},\mathbf{\theta})f({\mathbf{r}}_{i}{\mathbf{X}}_{i},\mathbf{\varphi})$$

Assuming that there are *P* patterns of missingness in a data set, the marginal distribution of **y*** _{i}* would be a mixture of pattern-specific distributions,
$f({\mathbf{y}}_{i})={\sum}_{p=1}^{P}f({\mathbf{y}}_{i}{r}_{i}=p,{\mathbf{X}}_{i},{\mathbf{\theta}}^{(p)}){\pi}_{p}$, where

*Shared-parameter model* assumes that **y*** _{i}* and

$$f({\mathbf{y}}_{i},{\mathbf{r}}_{i}{\mathbf{X}}_{i},\mathbf{\theta},\mathbf{\varphi})=\int f({\mathbf{y}}_{i}{\mathbf{\xi}}_{i},{\mathbf{X}}_{i},\mathbf{\theta})f({\mathbf{r}}_{i}{\mathbf{\xi}}_{i},{\mathbf{X}}_{i},\mathbf{\varphi})f({\mathbf{\xi}}_{i})\text{d}{\mathbf{\xi}}_{i}$$

From the viewpoint of causation, shared ‘parameters’ play the role of confounders for the relationship between **y*** _{i}* and

In certain biomedical studies, both missingness patterns and values of repeated measures are of interest. For example, in a heart-disease study, the repeatedly measured blood pressures and the survival lengths (a form of dropout patterns) of the patients are apt to be modeled jointly [23]. Within this scenario, the above selection, pattern-mixture, and shared-parameter models can be applied directly or after some modification. In the majority of biomedical research, however, only the parameters for the distribution of the repeated measures (i.e. **θ**) are of interest, while those related to missingness patterns are viewed as nuisance parameters. In this latter scenario, it would be desirable that we could ignore the missing values when making inferences regarding **θ**.

Within the setting of selection models, the concept of ‘ignorability’ was defined by Rubin [8] and extensively addressed thereafter. Missing values are said to be *ignorable* when two conditions hold: (i) **r*** _{i}* is independent of
${\mathbf{y}}_{i}^{\text{mis}}$, given
${\mathbf{y}}_{i}^{\text{obs}}$ and

Within the context of pattern-mixture or shared-parameter models, we define ignorability as a condition under which observed data can be used to estimate **θ** without bias. For pattern-mixture models, so long as
${\mathbf{y}}_{i}^{\text{mis}}$ does not depend on **r*** _{i}* (given
${\mathbf{y}}_{i}^{\text{obs}}$ and

As seen in Figure 2, dropout patterns display monotonic forms after sorting on the time of withdrawal. This feature makes it easier to characterize the dropout mechanism. Also, considering the fact that dropouts are more problematic in practice, we focus on modeling nonignorable dropouts in this paper.

Let us denote *t _{di}* as the dropout time for the

$${L}_{i}(\mathbf{\theta},\mathbf{\varphi}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{r}}_{i})\underset{f({y}_{ij}{\mathbf{H}}_{ij},\mathbf{\theta})\underset{[1-{p}_{j}({y}_{ij},{\mathbf{H}}_{ij})]Pr({r}_{{id}_{i}}=1{\mathbf{H}}_{{id}_{i}})}{\overset{}{j=1{d}_{i}-1}}}{\overset{}{j=1{d}_{i}-1}}$$

where *p _{j}* (

$$\text{logit}(Pr({r}_{ij}=1{y}_{ij},{\mathbf{H}}_{ij},\mathbf{\varphi}))={0}_{+}$$

where _{1} with a nonzero value implies an outcome-dependent nonignorable dropout mechanism.

The full log-likelihood function of the whole data set with sample size *N* for **θ** and **ϕ** can be partitioned into *l* (**θ ϕ**) = *l*_{1}(**θ**)+*l*_{2}(**ϕ**)+*l*_{3}(**ϕ**, **θ**), where
${l}_{1}(\mathbf{\theta})={\sum}_{i=1}^{N}log\{f({\mathbf{y}}_{i}^{\text{obs}})\}$ corresponds to the observed-data log-likelihood function for **θ**, and
${l}_{2}(\mathbf{\varphi})={\sum}_{i=1}^{N}{\sum}_{j=1}^{{d}_{i}-1}log\{1-{p}_{j}({\mathbf{H}}_{ij},{y}_{ij})\}$ and *l*_{3}(**ϕ**, **θ**) = Σ_{i≤N; di ≤J} log{Pr(*r _{idi}* = 1|

As shown by Verbeke and Molenburghs [12], the idea of the selection model could be originated from the Tobit model of Heckman [25]. Later, Troxel *et al*. [18] further extended it to handle nonmonotone missing values. Selection models for categorical and other types of repeated measures were also developed, see [26–29].

The high sensitivity of selection modeling to misspecification of measurement process and dropout mechanism has led to a growing interest in pattern-mixture modeling [30, 31]. After initial introduction [32, 33], they received more attention lately, e.g. for continuous repeated measures [17, 34–37] and for categorical measures [15, 38, 39].

For dropouts, a pattern-mixture model factorizes the joint distribution *f* (**y*** _{i}*,

$${f}_{j}(\mathbf{y})={f}_{j}({\mathbf{y}}^{\text{obs}}){f}_{j}({\mathbf{y}}^{\text{mis}}{\mathbf{y}}^{\text{obs}})$$

where **y**^{obs} = (*y*_{1}, …, *y _{j}*)

$${f}_{j}({y}_{s}{y}_{1},\dots ,{y}_{s-1})=\sum _{t=s}^{J}{\omega}_{st}{f}_{t}({y}_{s}{y}_{1},\dots ,{y}_{s-1}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}s=j+1,\dots ,J$$

Using this restriction method, the full density function *f _{j}* (

$${f}_{j}(\mathbf{y})={f}_{j}({\mathbf{y}}^{\text{obs}})\underset{[\sum _{t=J-s}^{J}{\omega}_{J-s,t}{f}_{t}({y}_{J-s}{y}_{1},\dots ,{y}_{J-s-1})]}{\overset{}{s=0J-j-1}}$$

Depending on the specification of the weights, several schemes of identification can be implemented. Setting all the weights to positive values corresponds to the identification scheme called *available case missing values* (ACMV [36]), which is the natural counterpart of the mechanism of MAR in the context of selection models. The restriction scheme called *complete-cases missing variable* (CCMV [32]) identifies *f _{j}* (

$${f}_{j}({y}_{s}{y}_{1},\dots ,{y}_{s-1})={f}_{J}({y}_{s}{y}_{1},\dots ,{y}_{s-1}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}s=j+1,\dots ,J$$

In this case, weights are set as *ω _{sJ}* = 1 and

$${f}_{j}({y}_{s}{y}_{1},\dots ,{y}_{s-1})={f}_{s}({y}_{s}{y}_{1},\dots ,{y}_{s-1}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}s=j+1,\dots ,J$$

which corresponds to *ω _{ss}* = 1 and

When the dynamic transition features in longitudinal data are of interest, an appropriate analytical approach is *via* transition models, which use previous observations to predict current ones. Here, we propose a shared-parameter model called REMTM to deal with continuous repeated measures subject to nonignorable dropout. Similar models have been proposed to analyze binary or count measures with nonmonotone missing values [14, 19]. Within REMTM, for each subject the repeated measures (**y*** _{i}*) are conditionally independent of the missingness indicators (

To model the measurement process, an order-1 Markov chain can be assumed for **y*** _{i}* = (

$${y}_{ij}={\mathbf{x}}_{ij}\mathbf{\beta}+({y}_{i,j-1}-{\mathbf{x}}_{i,j-1}\mathbf{\beta})\alpha +{\xi}_{i}+{\epsilon}_{ij}$$

where
${\xi}_{i}\stackrel{\text{iid}}{~}\text{N}(0,{\sigma}_{\xi}^{2})$ denotes the random intercept for subject *i*,
${\epsilon}_{ij}\stackrel{\text{iid}}{~}\text{N}(0,{\sigma}_{\epsilon}^{2})$ represents the residual errors as seen in standard linear regression models. To one’s most interest, **β** contains the fixed parameters regarding treatment efficacy in clinical trials. The parameter *α* sets up the link between the previous and the current unexplained measurement effects, i.e. between *y _{i}*

To model the dropout process, a logistic transition model with random intercepts is used. For missingness indicators **r*** _{i}* = (

$$P({r}_{ij}=k{\xi}_{i},{\mathbf{x}}_{ij},{r}_{i,j-1}=0)=\{\begin{array}{cc}\frac{1}{1+exp({\mathbf{x}}_{ij}\mathbf{\eta}+{\xi}_{i}\gamma )}& \text{if}\phantom{\rule{0.16667em}{0ex}}k=0\\ \frac{exp({\mathbf{x}}_{ij}\mathbf{\eta}+{\xi}_{i}\gamma )}{1+exp({\mathbf{x}}_{ij}\mathbf{\eta}+{\xi}_{i}\gamma )}& \text{if}\phantom{\rule{0.16667em}{0ex}}k=1\end{array}$$

where parameters **η** calibrate the influence of covariates on the possibility of dropout. The parameter *γ* indicates whether the dropout process shares the random intercepts with the measurement process. Hence, it tells us whether dropout is informative.

By combining the above sub-models for measurement and dropout, we can write the full-likelihood function for parameters
$\mathbf{\theta}={({\mathbf{\beta}}^{\text{T}},{\sigma}_{\xi}^{2},{\sigma}_{\epsilon}^{2})}^{\text{T}}$ and **ϕ**= (**η**, *γ*)^{T}, i.e.

$$L(\mathbf{\theta},\mathbf{\varphi})\underset{\int [\{\underset{p({y}_{ij}{\mathbf{x}}_{ij},{y}_{i,j-1},{\xi}_{i},\mathbf{\theta})}{\overset{\}}{j=1{d}_{i}-1}}\left\{\underset{p({r}_{ij}{\mathbf{x}}_{ij},{r}_{i,j-1},{\xi}_{i},\mathbf{\varphi})}{\overset{\}}{j=1{d}_{i}}}p({\xi}_{i})\right]\text{d}{\xi}_{i}}{\overset{}{i=1N}}$$

where *p*(*ξ _{i}*) is the density function for normally distributed random intercept

In practical settings, it is common to have longitudinal data sets with both intermittent missing values and dropouts. To deal with such data sets, this section proposes a series of imputation strategies to implement the full-likelihood-based models introduced earlier.

It has been popular to do statistical analysis with the method of imputation, which fills the empty cells in a data matrix by ‘plausible’ values predicted from empirical evidence or assumption-driven models. Imputation enforces an incomplete data set into a complete one. Thus, standard complete-data modeling techniques can be applied afterward. For repeated measures, the method of imputation is especially useful since repeated measures are often highly correlated. By imputing a data set once and treating it as the actual complete data set, the uncertainty in parameter estimation is apparently underestimated. To overcome this limitation, Rubin [46] proposed the method called *multiple imputation* within which multiple sets of imputed values are generated for the same set of missing values.

As argued earlier, intermittent missingness and dropout differ not only in pattern but also in mechanism and hence should be treated in different ways. Yang and Shoptaw [20] introduced the idea of MPI by conducting imputations only for intermittent missing values so that dropouts can be isolated and treated differently. MPI potentially offers a generic solution that can be applied throughout the whole spectrum of longitudinal data analysis. By partitioning
${\mathbf{y}}_{i}^{\text{mis}}$ into (
${\mathbf{y}}_{i}^{\text{IM}},{\mathbf{y}}_{i}^{\text{DM}}$) to denote intermittent missing values and dropouts, the first step of MPI is to draw *m* > 1 sets of imputations for the intermittent missing values:
${\mathbf{y}}_{i}^{\text{IM}(1)},{\mathbf{y}}_{i}^{\text{IM}(2)},\dots ,{\mathbf{y}}_{i}^{\text{IM}(m)}\phantom{\rule{0.16667em}{0ex}}(i=1,\dots ,N)$. Then, in the second step, each partially imputed data set (**Y**^{obs}, **Y**^{IM(}^{j}^{)}) (*j* = 1, …, *m*) along with **X** (i.e. data on covariates) is treated with selection, pattern-mixture, shared-parameter, or any other modeling strategy to deal with potentially nonignorable dropouts. Finally, in the third step, multiple versions of analysis are consolidated to derive an overall inference.

For this final step, a set of rules for consolidation was originally developed by Rubin [46] and later improved by Rubin and Schenker [47]. For the *j*th (*j* = 1, …, *m*) imputed data set, we denote ^{(}^{j}^{)} as the point estimate of *Q* (a parameter or quantity of interest) and *Û* ^{(}^{j}^{)}as the corresponding variance estimate. Then the MPI estimate (overall point estimate) of *Q* is
$\overline{Q}=(1/m){\sum}_{j=1}^{m}{\widehat{Q}}^{(j)}$. The associated variance of is *T = Ū* + ((*m* + 1)/*m*)*B*, where
$\overline{U}=(1/m){\sum}_{j=1}^{m}{U}^{(j)}$ and
$B=(1/(m-1)){\sum}_{j=1}^{m}{({Q}^{(j)}-\overline{Q})}^{2}$, respectively, represent the *within-imputation variability* and the *between-imputation variability*. For hypothesis testing, we can use the statistic (*Q* − )*T*^{−1/2}, which has approximately *t*-distribution with degrees of freedom *ν* = (*m* − 1)[1 + *Ū*/(1 + *m*^{−1})*B*]^{2}. Making proper MPI inferences requires that the multiple imputations be created ‘independently.’ In Section 3.2, we will see how to create independent imputed values *via* MCMC algorithms.

Within the second step of the above MPI inferential procedure, additional multiple imputations can be made for the dropouts when fitting a selection, pattern-mixture, or shared-parameter model. That is, for each partially imputed data set, (**Y**^{obs}, **Y**^{IM(}^{j}^{)}), we draw *n* > 1 sets of imputations for the dropouts:
${\mathbf{y}}_{i}^{\text{DM}(j,1)},{\mathbf{y}}_{i}^{\text{DM}(j,2)},\dots ,{\mathbf{y}}_{i}^{\text{DM}(j,n)}(i=1,\dots ,N)$. For emphasis, this version of MPI with sequential imputations is called a *two-stage MPI* in this article. When creating imputations for dropouts, the predictive density functions
$p({\mathbf{y}}_{i}^{\text{DM}(j,k)}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{y}}_{i}^{\text{IM}(j)})$ have different forms depending on the modeling strategy used (*j* = 1, …, *m; k* = 1, …, *n*). In Section 4.2, we will see that this method is especially useful for implementing pattern-mixture models. After the two-stage imputation, we would obtain *m* * *n* complete data sets with
${\mathbf{y}}_{i}^{(j,k)}=({\mathbf{y}}_{i}^{\text{obs}},{\mathbf{y}}_{i}^{\text{IM}(j)},{\mathbf{y}}_{i}^{\text{DM}(j,k)})\phantom{\rule{0.16667em}{0ex}}(i=1,\dots ,N;\phantom{\rule{0.16667em}{0ex}}j=1,\dots ,m;\phantom{\rule{0.16667em}{0ex}}k=1,\dots ,n)$. Each can be analyzed by traditional longitudinal models, e.g. marginal models with GEE or linear mixed-effects models, since concerns regarding missingness and dropout mechanism have been dissolved during the model-based imputation processes. Similar to MPI, the last step of the two-stage MPI is to derive the overall inference by combining the multiple analytical results. Nonetheless, Rubin’s rules presented earlier cannot be simply used by just presuming that the number of imputations is now *m* * *n* instead of *m*. Among all the *m* * *n* complete data sets, imputed values are far from being independent of each other, because each block (
${\mathbf{y}}_{i}^{(j,1)},\dots ,{\mathbf{y}}_{i}^{(j,n)}$) contains identical imputed values
${\mathbf{y}}_{i}^{\text{IM}(j)}$. Adopting the idea of ANOVA with nested blocks, a modified set of Rubin’s rules was developed by Shen [48], which can be used for making two-stage MPI inference. Within other contexts (e.g. cross-sectional survey data with nonresponse), sequential imputation strategies are also seen; see Harel [49] and Rubin [50].

One does not need to subscribe the Bayesian paradigm in developing a ‘proper’ imputation method so long as it satisfies a set of technical conditions [46] to guarantee frequency-valid inferences. An example of non-Bayesian imputation is seen in Section 3.4. Nonetheless, this set of conditions is useful in evaluating the properties of a given method but provides little guidance in practice to devising such a method. For this reason, a Bayesian process is often preferred. By specifying a parametric model based on the full-likelihood function and applying prior distributions to the unknown model parameters, we can simulate multiple independent draws from the conditional distribution of the missing data given the observed data using Bayes’ theorem. For the selection and REMTM models, such a conditional distribution would usually be too complicated to be directly simulated. As a solution, a collection of MCMC algorithms can be used. Within MCMC, parameters are drawn from a complicated distribution by forming a Markov chain that has this distribution as the stationary distribution. One of the most popular MCMC methods is *Gibbs sampling*, which simulates the conditional distribution of each component of a multivariate random variable given the other components in a cyclic manner. A series of Gibbs samplers have been developed and implemented into software packages, such as R/S-plus and SAS, to deal with various types of incomplete multivariate data [13].

Conceptually, one of the Gibbs sampling algorithms dealing with multivariate normal data [13] can be modified to conduct partial imputations within MPI. By iterating the following two steps, we have the Gibbs sampling algorithm for creating imputations for intermittent missing values:

*I-Step*: Draw values of intermittent missing data from their conditional predictive distribution, i.e. for*i*= 1, …,*N*$${\mathbf{y}}_{i}^{\text{IM}(t+1)}~f({\mathbf{y}}_{i}^{\text{IM}}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{X}}_{i},{\mathbf{r}}_{i},\mathbf{\psi})$$*P-step*: Conditioning on the drawn values for intermittent missing data (**Y**^{IM(}^{t}^{+1)}), draw parameters**ψ**from its posterior distribution based on partially imputed data,$${\mathbf{\psi}}^{(t+1)}f(\mathbf{\psi}{\mathbf{Y}}^{\text{obs}},{\mathbf{Y}}^{\text{IM}(t+1)},\mathbf{X},\mathbf{R})$$

In this algorithm, **ψ**= (**θ**, **ϕ**) represents a vector of all parameters in a model chosen to characterize the complete data and missingness mechanism. If nonignorability is assumed for intermittent missingness, we can use a selection or REMTM model. Although detailed modeling formulizations for nonignorable intermittent missingness is not given in this article, they are conceptually natural extensions of the models given in Section 2.3. For ignorable missingness, we can choose a linear mixed-effects or a transition model. By applying Bayes’ theorem, we have *f* (**ψ**|**Y**^{obs}, **Y**^{IM(}^{t}^{+1)}, **X**, **R**) *f* (**ψ**) *f* (**Y**^{obs}, **Y**^{IM(}^{t}^{+1)}, **X**, **R**|**ψ**), where *f* (**ψ**) represents the prior distribution for **ψ**. Depending on the choice of modeling strategy, the conditional predictive distribution
$f({\mathbf{y}}_{i}^{\text{IM}}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{X}}_{i},{\mathbf{r}}_{i},\mathbf{\psi})$ has different forms.

Starting from an initial value, **ψ**^{(0)} (i.e. *t* = 0), which can be any reasonable value obtained using an affordable way, the above I-step and P-step can be repeated with large enough number of iterations to yield a stochastic sequence {(**ψ**^{(}^{t}^{)}, **Y**^{IM(}^{t}^{)}): *t* = 1, …, *T*}. Provided that certain regularity conditions [51] hold, the empirical joint distribution of the parameter and missing values within this sequence will approach the stationary distribution *f* (**ψ**, **Y**^{IM}|**Y**^{obs}) as *T* → ∞. In practice, we would monitor the convergence properties of the process [52]. If diagnostics suggest that convergence is achieved after *T*_{0} iterations, we would then retain simulated missing values every (*T* − *T*_{0})/*m* iteration starting from *t* = *T*_{0} + 1 and treat them as multiple partial imputed values, which could be approximately viewed as being independent for large *T*. In this way, *m* > 1 sets of partial imputations are obtained, and each could be further analyzed to deal with dropouts.

A two-stage MPI procedure conducts a second round of imputation regarding the dropouts within each partially imputed data set, (**Y**^{obs}, **Y**^{IM(}^{j}^{)}) (*j* = 1, …, *m*). With the same flow structure as that of the sampler in Section 3.2.1, a Gibbs sampler for two-stage MPI also consists of two steps at each iteration:

*I-step*: Missing values due to dropout are drawn from their conditional predictive function, i.e.$${\mathbf{y}}_{i}^{\text{DM}(t+1)}~f({\mathbf{y}}_{i}^{\text{DM}}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{y}}_{i}^{\text{IM}(j)},{\mathbf{X}}_{i},{\mathbf{r}}_{i},\mathbf{\psi}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1,\dots ,N$$*P-step*: Conditioning on the drawn values for dropouts (**Y**^{DM(}^{t}^{+1)}), draw parameters of**ψ**from their complete-data posterior distribution,$${\mathbf{\psi}}^{(t+1)}f(\mathbf{\psi}{\mathbf{Y}}^{\text{obs}},{\mathbf{Y}}^{\text{IM}(j)},{\mathbf{Y}}^{\text{DM}(t+1)},\mathbf{X},\mathbf{R})$$

The model used in this algorithm aims at characterizing data only subject to dropout, since the intermittent missing values have been imputed in the first stage. Therefore, the parameters **ψ** should be different from that in the Gibbs sampler in Section 3.2.1, although the same symbol is used for presentation. A whole procedure of the two-stage MPI requires running both the above sampler and the one in Section 3.2.1. The imputation of **Y**^{DM} should be nested within each partially imputed data set (**Y**^{obs}, **Y**^{IM(}^{j}^{)}) (*j* = 1, …, *m*). By running this algorithm long enough, we can obtain *n* > 1 versions of imputation on **Y**^{DM} from the sampled stochastic sequence {(**ψ**^{(}^{t}^{)}, **Y**^{IM(}^{t}^{)}): *t* = 1, …, *T*} and end up with *m* × *n* sets of complete data: {(**Y**^{obs}, **Y**^{IM(}^{j}^{)}, **Y**^{DM(}^{j}^{,}^{k}^{)}): *j* = 1, …, *m*, *k* = 1, …, *n*}. Each can be analyzed using standard longitudinal models.

For the selection or the REMTM model, the full-likelihood function can be theoretically formulated and evaluated to make likelihood-based inferences. Unfortunately, the cost on computation is very high. When calculating the dropout probabilities in the selection model or integrating over the random effects in REMTM, time-consuming numerical solutions are demanded. Bayesian inferences based on MCMC, again, provide an affordable alternative way for fitting the models. By summarizing the sampled values on parameters, posterior distribution is naturally obtained to make exact inferences that do not count on large-sample approximation theories.

Recently, we developed a hybrid Gibbs sampling algorithm for selection models with structured covariance matrices [22]. The algorithm for such a model with AR(1) covariance structure is presented here. For complete continuous repeated measures with multivariate normal distribution (**y*** _{i}* ~ N(

$$P(\mathbf{\psi}\mathbf{Y},\mathbf{R})[\underset{\left\{f({\mathbf{y}}_{i}^{\text{obs}}\mathbf{\theta})\times (\underset{[1-{p}_{j}({y}_{ij},{\mathbf{H}}_{ij})]}{\overset{}{j=1{d}_{i}-1}})\times Pr({r}_{i,{d}_{i}}=1{\mathbf{H}}_{i,{d}_{i}})\}\right]\times f(\mathbf{\psi})}{\overset{}{i=1N}}$$

This function looks fairly complicated and the integration within Pr(*r _{i,di}* = 1|

*I-step*: Draw missing values at withdrawal {*y*:_{i,di}*i*= 1, …,*N*}:$${y}_{i,{d}_{i}}~{f}_{i,{d}_{i}}(y{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{x}}_{i,{d}_{i}},{d}_{i},\mathbf{\psi})\frac{1}{\sqrt{2\pi {v}_{i}}}exp\left\{-\frac{{(y-{\mathbf{x}}_{i,{d}_{i}}\mathbf{\beta}-{\mu}_{i})}^{2}}{2{v}_{i}}\right\}\times \frac{exp({0}_{+}}{}$$where by regressing*y*to ${\mathbf{y}}_{i}^{\text{obs}}$, we have ${\mu}_{i}={\mathbf{C}}_{({d}_{i}-1)}^{\text{T}}{\mathbf{\sum}}_{{d}_{i}-1}^{-1}({\mathbf{y}}_{i}^{\text{obs}}-{\mathbf{X}}_{i}\mathbf{\beta}),{v}_{i}={\sigma}^{2}(1-{\mathbf{C}}_{({d}_{i}-1)}^{\text{T}}{\mathbf{\sum}}_{{d}_{i}-1}^{-1}{\mathbf{C}}_{({d}_{i}-1)})$, and_{i,di}**C**_{(}_{j}_{)}= (*ρ*^{j}^{−1}, …,*ρ*)^{T}. For AR(1) covariance structure, the inverse of the covariance matrix for ${\mathbf{y}}_{i}^{\text{obs}}$ (i.e. ${\mathbf{\sum}}_{{d}_{i}-1}^{-1}$) has analytical expression.*P-step*: Draw parameters one by one in the following order:- For
*i*= 1, …,*K*, draw regression coefficients:$${\beta}_{k}~f({\beta}_{k}{\mathbf{\psi}}_{\backslash {\beta}_{k}},{\mathbf{Y}}^{}$$where ${\mathbf{y}}_{i}^{}$, and**ψ**_{\βk}indicates all parameters except*β*(‘\’ means ‘except’, similarly defined in the following)._{k} - Draw covariance parameter:$$\rho ~f(\rho {\mathbf{\psi}}_{\backslash \rho},{\mathbf{Y}}^{}$$
- Draw variance of residuals:$${\sigma}^{2}~f({\sigma}^{2}{\mathbf{\psi}}_{\backslash {\sigma}^{2}},{\mathbf{Y}}^{}$$
- For
*k*= 0, …,*J*, draw parameters of the dropout mechanism:$${k}_{~}$$

In presenting the above algorithm, we have assumed noninformative prior distributions for all the parameters: normal distributions with infinite variance for *β _{k}*,

REMTM characterizes both the continuous measuring process and the dropout mechanism as Markov processes. Since the dropout indicators and the repeated measures are conditionally independent of each other given the shared-random effects, in the imputation step, it is sufficient to sample only the random effects (which can be viewed as a special group of ‘missing data’). Still, we use **ψ**= (**θ**^{T}, **ϕ**^{T})^{T} to denote all parameters
$(\mathbf{\theta}={(\alpha ,{\mathbf{\beta}}^{\text{T}},{\sigma}_{\xi}^{2},{\sigma}_{\epsilon}^{2})}^{\text{T}}$ and **ϕ**= (**η**, *γ*)^{T}). A Gibbs sampler for REMTM consists of the following steps:

*I-step*: For*i*= 1, …,*N*, draw random intercepts:$${\xi}_{i}~f({\xi}_{i}{\mathbf{y}}_{i}^{\text{obs}},{\mathbf{x}}_{i},\mathbf{\psi})\underset{exp\left\{-\frac{{({y}_{ij}-({\mathbf{x}}_{ij}\mathbf{\beta}+({y}_{i,j-1}-{\mathbf{x}}_{i,j-1}\mathbf{\beta})\alpha +{\xi}_{i}))}^{2}}{2{\sigma}_{\epsilon}^{2}}\right\}}{\overset{\times}{j=2{d}_{i}-1}}$$*P-step*: Draw parameters one by one in the following order:- For
*k*= 1, …,*K*, draw fixed-effects regression coefficients in the measurement model:$${\beta}_{k}~f({\beta}_{k}{\mathbf{\psi}}_{\backslash {\beta}_{k}},\mathbf{\xi},{\mathbf{Y}}^{}$$ - Draw the parameter indicating transition from history in the measurement model:$$\alpha ~f(\alpha {\mathbf{\psi}}_{\backslash \alpha},\mathbf{\xi},{\mathbf{Y}}^{\text{obs}},\mathbf{X},\mathbf{R})\underset{\underset{exp\left\{-\frac{{({y}_{ij}-({\mathbf{x}}_{ij}\mathbf{\beta}+({y}_{i,j-1}-{\mathbf{x}}_{i,j-1}\mathbf{\beta})\alpha +{\xi}_{i}))}^{2}}{2{\sigma}_{\epsilon}^{2}}\right\}}{\overset{}{j=2{d}_{i}-1}}}{\overset{}{i=1N}}$$
- Draw variance of residuals in the measurement model:$${\sigma}_{\epsilon}^{2}~f({\sigma}_{\epsilon}^{2}{\mathbf{\psi}}_{\backslash {\sigma}_{\epsilon}^{2}},\mathbf{\xi},{\mathbf{Y}}^{\text{obs}},\mathbf{X},\mathbf{R})\underset{\underset{exp\left\{-\frac{{({y}_{ij}-({\mathbf{x}}_{ij}\mathbf{\beta}+({y}_{i,j-1}-{\mathbf{x}}_{i,j-1}\mathbf{\beta})\alpha +{\xi}_{i}))}^{2}}{2{\sigma}_{\epsilon}^{2}}\right\}}{\overset{}{j=2{d}_{i}-1}}}{\overset{}{i=1N}}$$
- Draw variance of random intercepts:$${\sigma}_{\xi}^{2}~f({\sigma}_{\xi}^{2}{\mathbf{\psi}}_{\backslash {\sigma}_{\xi}^{2}},{\mathbf{Y}}^{\text{obs}},\mathbf{X},\mathbf{R})\underset{exp\left\{-\frac{{\xi}_{i}^{2}}{2{\sigma}_{\xi}^{2}}\right\}}{\overset{}{i=1N}}$$
- For
*k*= 1, …,*K*, draw the regression coefficients in modeling the dropout mechanism:$${\eta}_{k}~f({\eta}_{k}{\mathbf{\psi}}_{\backslash {\eta}_{k}},\mathbf{\xi},{\mathbf{Y}}^{\text{obs}},\mathbf{X},\mathbf{R})\underset{[\underset{\frac{1}{1+exp({\mathbf{x}}_{ij}\mathbf{\eta}+{\xi}_{i}\gamma )}\times \frac{exp({\mathbf{x}}_{i,{d}_{i}}\mathbf{\eta}+{\xi}_{i}\gamma )}{1+exp({\mathbf{x}}_{i,{d}_{i}}\mathbf{\eta}+{\xi}_{i}\gamma )}}{\overset{}{j=2{d}_{i}-1}}]}{\overset{}{i=1N}}$$ - Draw the parameter indicating nonignorability:$$\gamma ~f(\gamma {\mathbf{\psi}}_{\backslash \gamma},\mathbf{\xi},{\mathbf{Y}}^{\text{obs}},\mathbf{X},\mathbf{R})\underset{[\underset{\frac{1}{1+exp({\mathbf{x}}_{ij}\mathbf{\eta}+{\xi}_{i}\gamma )}\times \frac{exp({\mathbf{x}}_{i,{d}_{i}}\mathbf{\eta}+{\xi}_{i}\gamma )}{1+exp({\mathbf{x}}_{i,{d}_{i}}\mathbf{\eta}+{\xi}_{i}\gamma )}}{\overset{}{j=2{d}_{i}-1}}]}{\overset{}{i=1N}}$$

Again, noninformative priors are used in the above algorithm [14]. It can be shown that all the above conditional distributions have log-concave forms, directly or after transformation. Thus, all conditional distributions can be simulated using the method of adaptive rejection sampling.

Within MPI, for each partially imputed data set, one of the above Gibbs samplers can be used to fit a selection or REMTM model to deal with dropouts. Simulated values of parameters can be summarized to obtain parameter estimates for each data set, which are then combined to make an MPI inference. An illustration of the strategy is seen in Sections 4.1 and 4.3.

For the two-stage MPI strategy, the above Gibbs samplers for the selection and REMTM models can be modified to create multiple imputations for dropouts. For the selection model Gibbs sampler, we only need to modify the I-step by drawing values on all the missing values after withdrawal {(*y _{i,di}*, …,

As seen in Section 2.3.2, there are at least three schemes in identifying restrictions for the parameters in pattern-mixture models: CCMV, NCMV, and ACMV. MPI or two-stage MPI provides a convenient framework for implementing these identification schemes. Assuming that the intermittent missing values have been imputed multiple times in a previous round, now we use the pattern-mixture model to create imputations for the dropouts without employing the Bayesian paradigm.

First, we fit a model to the pattern-specific identifiable densities: *f _{j}* (

In this section, we use the carbon monoxide data to illustrate the above imputation-based strategies. To deal with intermittent missing values, ignorability was assumed when generating MPIs. Then, for each partially imputed data set, we dealt with the possibly nonignorable dropouts using the selection, pattern-mixture, and REMTM models.

Using a piecewise linear mixed-effects model with ignorability assumption on missingness, Shoptaw *et al*. [21] reported a significant treatment effect of CM. Here, we reanalyzed a subset of the data starting from the second week using the strategy of MPI. After taking the logarithmic transformation, repeated carbon monoxide levels for each participant were viewed as multivariate normally distributed (i.e. **y*** _{i}* ~ N(

For each of the partially imputed data set, we applied the selection model to analyze the carbon monoxide levels after transformation. As seen from Figure 1, the mean carbon monoxide levels decline quickly within the first week approximately from the same starting levels and then remain leveling off at different levels throughout the rest of the study period. For each partially imputed data set, we used PROC MIXED in SAS to fit linear mixed models with various predictors and covariance structures. By model comparison with AIC, the following mean structure with AR(1) covariance was supported by all the four data sets. Thus, the AR(1) selection model was used with the following mean structure for characterizing the carbon monoxide levels:

$${y}_{ij}={\beta}_{0}+{\beta}_{1}{\text{CM}}_{i}+{\beta}_{2}{\text{RP}}_{i}+{\beta}_{3}{\text{RP}}_{i}{\text{CM}}_{i}+{\beta}_{4}{\text{BaseCO}}_{i}+{\beta}_{5}{\text{Patches}}_{i}$$

where CM* _{i}* and RP

$$\text{logit}({p}_{{d}_{i}}({y}_{{id}_{i}},{\mathbf{H}}_{ij}))={0}_{+}$$

where *d _{i}* indicates the dropout time of the

By running the Gibbs sampler with noninformative priors for the AR(1) selection model, we obtained the estimates of all the parameters for each partially imputed data set. Only interesting parameters are listed in Table I from which we see that the between-imputation variance is very small for each parameter. In other words, the fraction of missing information due to intermittent missingness is low. After consolidating the four sets of estimates using Rubin’s rules, it is clearly seen that the treatment effect of CM is significant (_{1} = −0.28; *T*_{2490} = −5.88 with *p*<0.0001). RP turns out to be ineffective and there is no significant interaction effect between CM and RP. The regression coefficient _{1} is significantly larger than zero (_{1} = 1.28; *T*_{2024} = 3.86 with *p* = 0.0002), suggesting that the higher the underlying missing value, the larger the probability of dropping out. In other words, the dropouts are possibly outcome-dependent nonignorable.

For the purpose of demonstrating pattern-mixture models, only the efficacy of CM is investigated in the following analyses. We first clustered participants into two groups: completers (*n*_{1} = 112) and early terminators (*n*_{2} = 62). Then within each group, the efficacy of CM was investigated. As seen from Figure 3, CM seems to be less effective for the early terminators. A linear mixed model with AR(1) covariance structure was selected (with predictors CM* _{i}*, BaseCO

Mean carbon monoxide levels for completers and early terminators. By dividing the 174 smokers into two groups: completers (*n*_{1} = 112) and early terminators (*n*_{1} = 62), the mean curves of carbon monoxide levels for subjects receiving CM (contingency management) **...**

$${y}_{ij}={\beta}_{0}+{\beta}_{1}{\text{CM}}_{i}+{\beta}_{2}{\text{BaseCO}}_{i}+{\beta}_{3}{\text{Patches}}_{i}$$

This model was applied separately to the completers and the early terminators. Let
${\widehat{\beta}}_{1}^{\text{c}}$ and
${\widehat{\beta}}_{1}^{\text{w}}$ denote the point estimators of *β*_{1}, respectively, for the completers and the early terminators, and _{c} = 64 per cent denote the estimated probability of being complete, then the overall pointer estimator is the weighted average,
${\widehat{\beta}}_{1}={\widehat{\pi}}_{\text{c}}{\widehat{\beta}}_{1}^{\text{c}}+(1-{\widehat{\pi}}_{\text{c}}){\widehat{\beta}}_{1}^{\text{w}}$, with variance derived using the delta method [56].

Since the fraction of missing information due to intermittent missing was low, only three sets of imputations for intermittent missing values were created this time. When conducting partial imputation, the procedure described in Section 4.1 was applied. The pattern-averaged point estimators and standard errors for *β*_{1} are listed in Table II. After consolidating, the overall point estimate is
${\overline{\widehat{\beta}}}_{1}=-0.25$ with standard deviation
$\sqrt{var(\overline{\widehat{\beta}})}=0.13$. The test based on the *t*-statistic gives a *p*-value of 0.06.

Estimated treatment effect of contingency management (_{1} (SD)) using the pattern-mixture model with two patterns (complete *versus* dropout).

In the above preliminary analysis, a simple pattern-mixture modeling strategy with only two target dropout patterns (complete or incomplete) was used within the framework of MPI. In the following, we describe the application of restriction identification strategies within the framework of the two-stage MPI for pattern-mixture models with larger number of dropout patterns.

When the number of target dropout patterns becomes large, the application of pattern-mixture models without imputation becomes less attractive. For example, the mean profiles of carbon monoxide levels across five dropout patterns are plotted in Plate 1, from which we observe notable within-pattern and across-pattern variances regarding the trajectory of the carbon monoxide levels. As the number of patterns increases, the number of subjects within each pattern becomes smaller, and it becomes tedious (even infeasible) to conduct pattern-specific analysis and then combine the results across patterns as we did above.

Adopting the procedure described in Section 3.4, three restriction schemes (CCMV, NCMV, and ACMV) were used to make multiple imputations for the dropouts. Within this two-stage MPI, the numbers of partial imputations were set as *m* = 2 for the imputation of intermittent missing values and *n* = 3 for the imputation of dropouts. Again, the Gibbs sampling process with multivariate normal assumption was used for imputing intermittent missing values (see Section 4.1). Hence, we ended up with totally six complete data sets. Then, each complete data set was analyzed using a linear mixed model with AR(1) covariance structure with predictors CM, BaseCO, and Patches. Using the consolidation procedure as described in Section 3.2.2, the final point estimates and fractions of missing information for the treatment effect of CM are shown in Table III. The *p*-values of a one-sided hypothesis test using the *t*-statistics are also listed. From these results, we can see that the fraction of missing information due to dropout is much higher than that due to intermittent missingness. Two out of three identification strategies strongly support the favorable treatment efficacy of CM.

We reanalyzed the same subset of the four groups of carbon monoxide levels using the REMTM model as we did using the selection model. The carbon monoxide data after dichotomization were analyzed by Yang *et al.* [11] using an REMTM for binary repeated measures subject to intermittent missingness and dropout. Here, the continuous data were analyzed using the hybrid Gibbs sampler for REMTM with predictors: CM* _{i}*, RP

Posterior parameter estimation with standard deviation and 95 per cent credible intervals using the REMTM to the continuous carbon monoxide data.

The estimated parameters for
${\sigma}_{\xi}^{2}$ and *γ* jointly suggest that dropout is random-effects dependent, hence nonignorable. The introduced random-intercept effects (i.e. *ξ _{i}*’s) capture the heterogeneity on dropout and carbon monoxide levels across the subjects. Among all the estimated parameters of

This paper introduces alternative imputation-based strategies for implementing longitudinal models with full-likelihood functions in dealing with intermittent missing values and dropouts that are potentially nonignorable. Using the carbon monoxide data set from a smoking cessation clinical trial, we have demonstrated the application of MPI and two-stage MPI to implement selection, pattern-mixture, and shared-random-effects models. We emphasize that the framework of MPI or two-stage MPI provides a very flexible solution for incomplete longitudinal data analysis. When drawing imputation on the intermittent missing values, various modeling options can be employed, depending on the assumption on the missingness mechanism (e.g. MCAR, MAR, or nonignorable). Although the formulation of selection, pattern-mixture, and shared-parameter models is only presented for repeated measures subject to dropout, it is conceivable that similar ideas be developed for intermittent missingness. When handling dropouts, we can use another group of advanced modeling options. The models used for the two types of missingness can be totally different. For example, a selection model can be used for imputing intermittent missing values while a pattern-mixture model is used for imputing dropouts. It is also possible that imputation of intermittent missing values and analysis on dropouts are conducted by different persons at different places and times.

Another advantage of imputation-based strategies regards sensitivity analysis. As discussed earlier, a notable limitation with incomplete data analysis is that the true model and mechanism for measurements and missing values (including dropouts) are usually unverifiable in practical settings. When making MPI inferences, as mentioned above, various combinations of modeling schemes can be implemented with various assumptions. Therefore, MPI provides a useful tool in studying the sensitivity of model-based analytical conclusions. For one data set, if different MPI modeling strategies end up with inconsistent conclusions regarding the efficacy of the same treatment or intervention in a study, then further investigation should be conducted. We should try to avoid reporting with confidence a strong conclusion based only on one modeling option, while other options suggest controversial results. For the same set of data from the smoking trial, we applied various models to analyze the treatment efficacy of two behavioral therapies: CM and RP. Overall results depict a consistent image in supporting the favorable efficacy of CM.

It should also be noted that selection, pattern-mixture, and shared-parameter models are generalized versions of standard longitudinal models (i.e. marginal models using GEE, linear mixed-effects models, and transition models). For example, the linear mixed-effects model ignoring missing values can be viewed as a selection model with the MAR assumption. Though only continuous repeated measures are targeted in this article, the modeling strategies based on the full-likelihood function can be extended for other formats of repeated measures.

When describing the various Gibbs sampling algorithms for model fitting or multiple imputation, we tried to present the technical implementation as detailed as we can, but a full description for all possible modeling techniques is not possible. Most technical details are seen in the user manual and the technical report of the MPI software package. When specifying the prior distribution for the parameters, we adopted the choice of noninformative priors because there were no historical data or empirical evidence that could guide us in eliciting proper prior options. As raised by an anonymous reviewer, it is possible that flat priors are problematic in practice especially for those parameters related to missingness or dropout mechanism. This is notable when there are very few missing values, providing very little information for estimating the missingness-related parameters. For our carbon monoxide data set, fortunately, this was not the issue. When monitoring the convergence of the Gibbs samplers, we basically tried two main approaches. The first one was to generate two or more Markov chains starting from different initial values and wait until they interweaved with each other and the between-chain variance was much smaller than the within-chain variance. The second approach was to work with one chain, retaining only every *k*th sample after the burn-in period, with *k* set as a large enough number (e.g. 10) such that the retained samples are approximately independent. Usually, we stopped simulation procedure when the quantiles of all or some selected parameters were stable.

This work was supported by the National Institute of Drug Abuse through an SBIR contract N44 DA35513 and three research grants: R03 DA016721, R01 DA09992, and P50 DA18185. We especially thank Hamutahl Cohen for her editorial assistance and the reviewers for their constructive comments.

1. Nich C, Carroll KM. ‘Intention-to-treat’ meets ‘missing data’: implications of alternate strategies for analyzing clinical trials data. Drug and Alcohol Dependence. 2002;68:121–130. [PubMed]

2. Hedeker D, Gibbons RD. A random effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933–944. [PubMed]

3. Follmann D, Wu M. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed]

4. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

5. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed]

6. Diggle P, Heagerty P, Liang K-Y, Zeger S. Analysis of Longitudinal Data. 2. Oxford University Press; Oxford: 2002.

7. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2. Wiley; New York: 2002.

8. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–582.

9. Robins JM, Rotnitzky AG, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–120.

10. Diggle PJ, Kenward MG. Informative drop-out in longitudinal data analysis. Applied Statistics. 1994;43(1):49–93.

11. Yang X, Nie K, Belin T, Liu J, Shoptaw S. Markov transition models for binary repeated measures with ignorable and nonignorable missing values. Statistical Methods in Medical Research. 2007;16(4):347–364. [PubMed]

12. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; New York: 2000.

13. Schafer JL. Analysis of Incomplete Multivariate Data. Chapman & Hall; London: 1997.

14. Li J, Yang X, Wu Y, Shoptaw S. A random-effects Markov transition model for Poisson-distributed repeated measures with nonignorable missing values. Statistics in Medicine. 2007;26(12):2519–2532. [PubMed]

15. Molenberghs G, Michiels B, Lipsitz SR. Selection models and pattern-mixture models for incomplete categorical data with covariates. Biometrics. 1999;55:978–983. [PubMed]

16. Albert PS, Follmann DA. Modeling longitudinal count data subject to informative dropout. Biometrics. 2000;56:667–677. [PubMed]

17. Little RJA. Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association. 1995;90:1112–1121.

18. Troxel AB, Harrington DP, Lipsitz SR. Analysis of longitudinal data with non-ignorable non-monotone missing values. Applied Statistics. 1998;47:425–438.

19. Albert PS, Follmann DA. A random effects transition model for longitudinal binary data with informative missingness. Statistica Neerlandica. 2003;57:100–111.

20. Yang X, Shoptaw S. Assessing missing data assumptions in longitudinal studies: an example using a smoking cessation trial. Drug and Alcohol Dependence. 2005;77:213–225. [PubMed]

21. Shoptaw S, Rotheram-Fuller E, Yang X, Frosch D, Nahom D, Jarvik ME, Rawson RA, Ling W. Smoking cessation in methadone maintenance. Addiction. 2002;97:1317–1328. [PubMed]

22. Jennrich RI, Schluchter MD. Unbalanced repeated measures model with structural covariance matrices. Biometrics. 1986;42:805–820. [PubMed]

23. Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997;16:239–258. [PubMed]

24. Murray GD, Findlay JG. Correcting for the bias caused by drop-outs in hypertension trials. Statistics in Medicine. 1988;7(9):941–946. [PubMed]

25. Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement. 1976;5:475–492.

26. Fitzmaurice GM, Molenberghs G, Lipsitz SR. Regression models for longitudinal binary responses with informative dropouts. Journal of the Royal Statistical Society: Series B. 1995;57:691–704.

27. Molenberghs G, Kenward MG, Lesaffre E. The analysis of longitudinal ordinal data with non-random dropout. Biometrika. 1997;84:33–44.

28. Nordheim EV. Inference from nonrandomly missing categorical data: an example from a genetic study on Turner’s syndrome. Journal of the American Statistical Association. 1984;79:772–780.

29. Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal studies data. Statistical Methods in Medical Research. 1999;8:51–83. [PubMed]

30. Glynn RJ, Laird NM, Rubin DB. Selection modeling versus mixture modeling with non-ignorable nonresponse. In: Wainer H, editor. Drawing Inferences from Self Selected Samples. Springer; New York: 1986. pp. 115–142.

31. Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3-2:245–265. [PubMed]

32. Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134.

33. Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483.

34. Ekholm A, Skinner C. The Muscatine children’s obesity data reanalysed using pattern mixture models. Applied Statistics. 1998;47:251–263.

35. Hogan JW, Laird NM. Intent-to-treat analysis for incomplete repeated measures. Biometrics. 1996;52:1002–1007. [PubMed]

36. Molenberghs G, Michiels B, Kenward MG, Diggle PJ. Missing data mechanisms and pattern-mixture models. Statistica Neerlandica. 1998;52:153–161.

37. Michiels B, Molenberghs G, Lispsitz SR. A pattern-mixture odds ratio model for incomplete categorical data. Communication in Statistics: Theory and Methods. 1999;28(12):2863–2870.

38. Birmingham J, Fitzmaurice GM. A pattern-mixture model for longitudinal binary responses with nonignorable nonresponse. Biometrics. 2002;58(4):989–996. [PubMed]

39. Birmingham J, Rotnitzky A, Fitzmaurice GM. Patternmixture and selection models for analysing longitudinal data with monotone missing patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65(1):275–297.

40. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188.

41. Wu MC, Bailey KR. Estimation and comparison of changes in the pressure of informative right censoring: conditional linear model. Biometrics. 1989;45:939–955. [PubMed]

42. Wu MC, Follmann DA. Use of summary measures to adjust for informative missingness in repeated measures data with random effects. Biometrics. 1999;55:75–84. [PubMed]

43. Albert PS. A transitional model for longitudinal binary data subject to nonignorable missing data. Biometrics. 2000;56:602–608. [PubMed]

44. Pulksteinis EP, Ten Have TR, Landis R. Model for the analysis of binary longitudinal pain data subject to informative dropout through remedication. Journal of the American Statistical Association. 1998;93:438–450.

45. Ten Have TR, Kunselman AR, Pulksteinis EP, Landis R. Mixed effects logistic regression models for longitudinal binary repeated response data with informative drop-out. Biometrics. 1998;54:367–383. [PubMed]

46. Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley; New York: 1987.

47. Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986;81:366–374.

48. Shen ZJ. PhD Dissertation. Department of Statistics, Harvard University; Cambridge, MA: 2000. Nested multiple imputation.

49. Harel O. PhD Dissertation. Department of Statistics, Pennsylvania State University; University Park, PA: 2003. Strategies for data analysis with two types of missing values.

50. Rubin DB. Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica. 2003;57:3–18.

51. Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Chapman & Hall; London: 1996.

52. Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association. 1996;91:883–904.

53. Yang X, Li J. A hybrid Gibbs sampler for selection models in dealing with outcome-dependent nonignorable dropouts. UCLA Statistics Electronic Publications. 2005 Preprint 451.

54. Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Applied Statistics. 1992;41:337–348.

55. SAS Institute Inc. SAS/STAT* Software Changes and Enhancements, Release 8.2. SAS Institute Inc; 2001.

56. Hedeker D, Gibbons RD. Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods. 1997;2:64–78.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |