Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Med. Author manuscript; available in PMC 2011 February 2.
Published in final edited form as:
PMCID: PMC3032542

Imputation-based strategies for clinical trial longitudinal data with nonignorable missing values


Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.

Keywords: multiple partial imputation, selection model, pattern-mixture model, Markov transition model, nonignorable dropout, intermittent missing values


1.1. Background

In biomedical research with clinical trials, the effectiveness of a treatment or intervention method is often investigated by adopting a longitudinal design where each subject is repeatedly measured on the variables of interest throughout a period of time. In such studies, there are often missing values reflecting the problematic nature of the phenomenon under study, such as substance abuse [1] and mental health disorder [2]. The proportion of missingness is sometimes notably large, e.g. 70 per cent at termination in a randomized trial of buprenorphine versus methadone in treating addiction to cocaine use [3]. Although investigators may devote substantial effort in minimizing the number of missing values, some amount of missingness is often inevitable in practice.

A convenient way for incomplete longitudinal data analysis is to ignore those missing values and fit a model based on all available data. Currently, three groups of longitudinal models are popularly used in this way: marginal models with generalized estimating equations (GEE) for studying the group-averaged characteristics [4], mixed models with random effects used to describe heterogeneity among individuals [5], and transition models, which model the sequence of responses from a person dynamically by conditioning on previous observations and baseline features [6]. Although different assumptions are required by each modeling option, they essentially require that missing values be at least ‘ignorable,’ i.e. the indicator variable for whether a measure is missing is independent of the value of that measure given other observed measures and covariates [7, 8]. Here, the independence between the missingness indicators and missing values also requires that the parameters in modeling the repeated measures and those modeling the mechanism of missingness are distinct. Without specifying likelihood functions, the method of GEE offers a flexible modeling strategy. Although the standard version of GEE requires a stronger assumption, various adjusting methods have been proposed to handle ignorable missing values [9].

Unfortunately, there are some studies where empirical evidence suggests that ignorability is implausible [1012]. In this case, when standard modeling options are used, invalid statistics or biased estimators may be obtained. Hence, advanced modeling strategy assuming nonignorable missingness becomes desirable. In previous years, several lines of modeling development for nonignorable missingness have been initiated based on modeling the joint distribution of the indicators of missingness and the values of repeated measures (including observed and missing values). The likelihood function derived from the joint distribution is called the full-likelihood function [13]. As summarized by Li and coworkers [14], at least three factorizations of the joint distribution could be considered: (1) outcome-dependent factorization, where missingness indicators are assumed to be conditioned on the values of repeated measures; (2) pattern-dependent factorization, where the distribution of repeated measure values is a mixture of distributions for subjects within sub-groups determined by the patterns of missingness; and (3) parameter-dependent factorization, where repeated measure values and missingness indicators are conditionally independent of each other given a group of shared parameters. Correspondingly, three models could be conceived according to the way of factorization and are termed, respectively, as selection models, pattern-mixture models, and shared-parameter models.

Compared with intermittent missingness (occasional omission), dropout (premature withdrawal) usually leads to a larger proportion of missingness and the mechanism for dropout is often associated with both missing and observed values (as well as baseline covariates such as treatment assignment). Hence, dropout is more problematic. To deal with nonignorable dropouts (i.e. missing values after withdrawal), Diggle and Kenward [10] proposed the method of selection model while comparing diets of cows for increasing their milk protein content. Using the data set from a multi-center trial with a parallel group design to study the efficacy of Vorozole for treating breast cancer, Molenburghs and colleagues [15] developed strategies for fitting pattern-mixture models for dropouts. For repeatedly measured count data subject to dropout, Albert and Follmann [16] introduced the prototype of a shared-parameter model. For a detailed review on modeling dropout mechanisms, refer to Little [17]. The presence of intermittent missing values in addition to dropouts further complicates the modeling procedures. Unlike dropouts, the patterns of intermittent missingness can be of any nonmonotonic form. Therefore, intermittent missingness is technically difficult to handle, although conceptually easy. According to Troxel and coauthors [18], the selection model was extended to include the case of intermittent missing values. A breakthrough contribution was given by Albert and Follmann [19], who proposed the shared-random-effects Markov transition model (REMTM) to deal with nonignorable missingness in longitudinal binary data. This model was further generalized by Li et al. [14] to accommodate Poisson-distributed count measures.

Since the subjects in question still remain in the study, we may be able to assume that intermittent missingness is ignorable [6] when the proportion of missingness is moderate. Adopting this point of view, Yang and Shoptaw [20] developed the idea of multiple partial imputation (MPI) to assess dropout mechanisms when there are intermittent missing values. Within the framework of MPI, intermittent missing values are first assumed to be ignorable and imputed using an outcome-dependent modeling technique; then the partially imputed data sets are analyzed to investigate the dropout mechanism; and finally, the multiple versions of assessment are combined to make one final set of inferential statements.

The approach of MPI provides a much more general framework, not only for data exploration but also for analysis purposes. This article proposes strategies for implementing incomplete longitudinal models to analyze continuous repeated measures with intermittent missing values and dropouts. The article is organized as follows. We begin by introducing a motivating practical data set from a smoking cessation clinical trial where a moderate amount of missing values is seen. In Section 2, we discuss modeling strategies based on full-likelihood functions for incomplete longitudinal data with special attention to handle nonignorable dropouts. In Section 3, we propose imputation-based strategies and Markov chain Monte Carlo (MCMC) algorithms for implementing various incomplete longitudinal models. The smoking cessation data set is then analyzed using the above strategies, and finally we give some practical guidelines for using the proposed imputation-based strategies.

1.2. A motivating study

The development of this work was closely related to the analysis of a clinical trial of smoking cessation in methadone-maintained tobacco smokers [21]. This study tested the effectiveness of a relapse prevention (RP) program and a contingency management (CM) program, alone and in combination, for improving smoking cessation outcomes using nicotine transdermal pharmacotherapy. A total of 174 participants were randomly assigned to one of the four behavioral treatment groups: a control group that received no behavioral therapy (42 subjects); RP-only (42 subjects); CM-only (43 subjects); and a combined RP+CM condition (47 subjects). Thirty-six measures of carbon monoxide levels in expired breath were scheduled to be taken on each participant over the 12-week study period, three times per week.

Figure 1 depicts the mean values of observed carbon monoxide levels for the four treatment groups, after a log(1 + y) transformation. Also depicted are standard deviations and point-wise ANOVA results with p-values smaller than 0.01 after ignoring missing values. A problem with this exploratory analysis is that the 36 p-values cannot be easily combined in making inferences regarding overall differences. Additionally, the comparison between treatment conditions when missing values are ignored may lead to biased conclusions when missingness is not completely at random. For example, if smokers in the three treatment groups dropped out with higher probabilities given a higher level of previously observed carbon monoxide while smokers in the control group dropped out completely at random, then mean levels of carbon monoxide in the treatment groups would turn out to be lower than those in the control group at visit times close to the termination of the study, even though there are no treatment effects at all.

Figure 1
The average and SD curves for the log-scaled carbon monoxide levels. On this plot, the four mean curves of the log-scaled carbon monoxide levels and the corresponding pointwise standard errors are drawn for each of the four treatment conditions: Control, ...

In Figure 2, the patterns of missingness are plotted for each treatment group, after a sorting process on the dropout times. From the graphs, it is seen that missingness due to dropout corresponds to monotonic forms. At the termination of the study, up to 36 per cent of the participants had withdrawn. An overall percentage of 4.3 per cent of intermittent missing values is seen. The patterns and rates of missing values in this study are typical in substance abuse research. Assuming that intermittent missing values and dropouts were ignorable, random-effects models were applied to the whole incomplete data set and significantly favorable effect of CM was reported in Shoptaw et al. [21]. In Section 4, we will reanalyze this set of carbon monoxide levels to illustrate various modeling strategies introduced in the following two sections.

Figure 2
Missingness patterns for the carbon monoxide levels across treatment conditions. For each treatment condition, an image depicts the missingness indicators of carbon monoxide levels for each smoker at each research visit. Dark colored area indicates that ...


For a longitudinal data set with balanced design, J repeated measures are potentially observed on each of the N independent subjects at times ti1, …, tiJ (i = 1, …, N; j = 1, …, J). For the following discussion, we use capital letters to represent variables (e.g. Y1, …, YJ indicate response variables, and X1, …, XK indicate covariates or explanatory variables) and lower letters for values, which may be observed or missing (e.g. yij denotes the value of Yj and xijk denotes the value of Xk recorded at time tij; k = 1, …, K). Bold symbols are used to represent vectors or matrices: yi = (yi1, …, yiJ)T indicates values of repeated measures; Xi = [xijk]J×K consists of possibly time-varying covariates for the ith subject. Assuming that repeated measures are distributed as multivariate normal, a repeated-measures model with structured covariance matrix can be expressed as yi = Xi β + εi, where εi ~ N(0, Σi) and β is a vector of fixed-effects parameters. Various ways of parameterization of the covariance matrix Σi are conceivable, allowing various forms for the specification of a wide range of repeated-measures models [22].

2.1. Models with full-likelihood function

When values of some measures are missing, we partition yi into two parts, yi=(yiobs,yimis)T, where yiobs indicates the observed values and yimis indicates values that would be observed if they were not missing. For convenience, we also introduce a vector of missingness indicators, ri = (ri1, …, riJ)T, where rij = 0 (or 1) indicates whether yij is observed (or missing), and R = [rij]N×J represents the missingness patterns for the whole data matrix Y= [yij]N×J. Theoretically, the joint distribution of the observed data and missingness patterns should be modeled in statistical analysis based on the full-likelihood function, i.e.

L(θ,ϕ[mid ]yiobs,Xi,ri)[proportional, variant][product]i=1Nf(yi,ri[mid ]Xi,θ,ϕ)dyimis

where vectors θ and ϕ, respectively, represent the parameters of the measurement model and those of the missingness mechanism. Determined by possible causal pathways, there exist at least three approaches to decompose the joint distribution of the complete data and missingness indicators: outcome-dependent factorization, pattern-dependent factorization, and parameter-dependent factorization. Accordingly, we have the following models for incomplete longitudinal data.

Selection model factors the joint distribution f (yi, ri|Xi, θ, ϕ) into a marginal distribution of yi and a conditional distribution of ri given yi (i.e. outcome dependent),

f(yi,ri[mid ]Xi,θ,ϕ)=f(yi[mid ]Xi,θ)f(ri[mid ]yi,Xi,ϕ)

where f (ri|yi, Xi, ϕ) can be interpreted as ‘self-selection of the ith subject into a specific missingness-pattern group.’

Pattern-mixture model is a pattern-dependent model, assuming that distribution of repeated measures varies with the missingness patterns and the joint distribution is factored as

f(yi,ri[mid ]Xi,θ,ϕ)=f(yi[mid ]ri,Xi,θ)f(ri[mid ]Xi,ϕ)

Assuming that there are P patterns of missingness in a data set, the marginal distribution of yi would be a mixture of pattern-specific distributions, f(yi)=p=1Pf(yi[mid ]ri=p,Xi,θ(p))πp, where θ(p) represents the parameters of f (yi) in the pth pattern, πp = Pr(ri = p|Xi, ϕ) and indicator ri here indexes the missingness patterns.

Shared-parameter model assumes that yi and ri are conditionally independent of each other, given a group of parameters ξi, i.e.

f(yi,ri[mid ]Xi,θ,ϕ)=f(yi[mid ]ξi,Xi,θ)f(ri[mid ]ξi,Xi,ϕ)f(ξi)dξi

From the viewpoint of causation, shared ‘parameters’ play the role of confounders for the relationship between yi and ri and, hence, can be either observable attributes (e.g. gender) or latent factors (e.g. random effects).

2.2. Ignorability

In certain biomedical studies, both missingness patterns and values of repeated measures are of interest. For example, in a heart-disease study, the repeatedly measured blood pressures and the survival lengths (a form of dropout patterns) of the patients are apt to be modeled jointly [23]. Within this scenario, the above selection, pattern-mixture, and shared-parameter models can be applied directly or after some modification. In the majority of biomedical research, however, only the parameters for the distribution of the repeated measures (i.e. θ) are of interest, while those related to missingness patterns are viewed as nuisance parameters. In this latter scenario, it would be desirable that we could ignore the missing values when making inferences regarding θ.

Within the setting of selection models, the concept of ‘ignorability’ was defined by Rubin [8] and extensively addressed thereafter. Missing values are said to be ignorable when two conditions hold: (i) ri is independent of yimis, given yiobs and Xi, and (ii) θ and ϕ are distinct. Under ignorability, the log-likelihood function for θ can be separated from the log-likelihood function for ϕ, i.e. l(θ,ϕ[mid ]yiobs,ri)=l(θ[mid ]yiobs)+l(ϕ[mid ]yiobs,ri). Little and Rubin [7] further classified this type of ignorability (based on outcome-dependent factorization) into two sub-categories: MCAR (i.e. Pr(ri|yi, ϕ) = Pr(ri|ϕ)) and MAR (i.e. Pr(ri[mid ]yi,ϕ)=Pr(ri[mid ]yiobs,ϕ)). For intermittent missing values, ignorability or nonignorability could be interpreted by whether the missing values can be unbiasedly interpolated from neighborhood observed values. For dropouts, the diagnostics of ignorabilty corresponds to test whether missing values after dropout can be unbiasedly extrapolated from the previous observed values. In certain applications, occasional omissions or nonresponses happen due to reasons unrelated to the outcome (e.g. schedule conflicts or bad weather) and the missing values could be unbiasedly imputed from observed values or follow-up inquiries. Therefore, ignorability can be assumed for them. Nonetheless, subjects withdraw from a study usually because of study-related reasons (e.g. being unsatisfied with the intervention or its side effects) with a mechanism similar to outcome-dependent censoring and, hence, are usually nonignorable [6, 12, 24].

Within the context of pattern-mixture or shared-parameter models, we define ignorability as a condition under which observed data can be used to estimate θ without bias. For pattern-mixture models, so long as yimis does not depend on ri (given yiobs and Xi), missing data are thought to be ignorable. For shared-parameter models, ignorability corresponds only to the case where ξi are observable confounders, which are usually viewed as a subset of Xi. Otherwise, when latent variables such as random effects are shared, a shared-parameter model would generally associate with the assumption of nonignorability. It is also noted here that ‘informative’ is sometimes used to describe a specific form of nonignorability, e.g. ‘informative dropout’ (within the context of selection model, [10]) and ‘informative process for missingness’ (within the context of shared-parameter models [19]).

2.3. Modeling nonignorable dropouts

As seen in Figure 2, dropout patterns display monotonic forms after sorting on the time of withdrawal. This feature makes it easier to characterize the dropout mechanism. Also, considering the fact that dropouts are more problematic in practice, we focus on modeling nonignorable dropouts in this paper.

2.3.1. Selection model

Let us denote tdi as the dropout time for the ith subject, where 2≤diJ + 1 (di = J + 1 indicates a subject who has completed the study). Then, missingness indicator ri is a vector of di − 1 consecutive zeros followed by J + 1 −di consecutive ones. Suppressing the dependence on covariates, the selection model of Diggle and Kenward [12] assumes: (i) Pr(rij = 1|j > di) = 1; (ii) for jdi, rij depends on yij and its history Hij = (yi1, …, yi, j−1)T; and (iii) the conditional distribution of yij given Hij is fij (y|Hij, θ). The full-likelihood function for the ith subject can be expressed as

Li(θ,ϕ[mid ]yiobs,ri)[proportional, variant][product]j=1di1f(yij[mid ]Hij,θ)[product]j=1di1[1pj(yij,Hij)]Pr(ridi=1[mid ]Hidi)

where pj (yij, Hij) = Pr(rij = 1|yij, Hij, ϕ) represents the probability of dropout at time tij. Dropout probability Pr(ridi = 1|Hidi) = Pr(ridi = 1|y, Hidi, ϕ) fidi (y|Hidi, θ) dy if di < J + 1 (y represents the possible value of yidi) and Pr(ridi = 1|Hidi) = 1 if di = J +1. A natural choice for calculating Pr(rij = 1|yij, Hij, ϕ) is a logistic regression model:

logit(Pr(rij=1[mid ]yij,Hij,ϕ))=[var phi]0+[var phi]1yij+k=2j[var phi]kyi,j+1k(i=1,,N,j=1,,J)

where [var phi]1 with a nonzero value implies an outcome-dependent nonignorable dropout mechanism.

The full log-likelihood function of the whole data set with sample size N for θ and ϕ can be partitioned into l (θ ϕ) = l1(θ)+l2(ϕ)+l3(ϕ, θ), where l1(θ)=i=1Nlog{f(yiobs)} corresponds to the observed-data log-likelihood function for θ, and l2(ϕ)=i=1Nj=1di1log{1pj(Hij,yij)} and l3(ϕ, θ) = ΣiN; diJ log{Pr(ridi = 1|Hidi)} together determine the log-likelihood function of the dropout process. If dropouts are ignorable, then l3(θ, ϕ) depends only on ϕ and can be absorbed into l2(ϕ); thus estimation of θ can be solely derived from l1(θ).

As shown by Verbeke and Molenburghs [12], the idea of the selection model could be originated from the Tobit model of Heckman [25]. Later, Troxel et al. [18] further extended it to handle nonmonotone missing values. Selection models for categorical and other types of repeated measures were also developed, see [2629].

2.3.2. Pattern-mixture model

The high sensitivity of selection modeling to misspecification of measurement process and dropout mechanism has led to a growing interest in pattern-mixture modeling [30, 31]. After initial introduction [32, 33], they received more attention lately, e.g. for continuous repeated measures [17, 3437] and for categorical measures [15, 38, 39].

For dropouts, a pattern-mixture model factorizes the joint distribution f (yi, di|Xi, θ, ϕ) into the product of the marginal distribution f (di|Xi, ϕ) and the conditional distribution f (yi|Xi, θ(di)), where di = 2, …, J + 1 indicates the dropout time. A big challenge in pattern-mixture modeling regards parameter identification. For any subject with early withdrawal (di < J + 1), the sub-vector of θ(di) characterizing the distribution of missing values ( yimis) is generally unidentified, unless certain restrictions are applied. Thijs et al. [31] proposed a framework for identifying restrictions. By suppressing the subscript i and replacing di with j, we can express the full probability density function for the pattern with dropout time at tj, i.e.

fj(y)=fj(yobs)fj(ymis[mid ]yobs)

where yobs = (y1, …, yj)T and fj (ymis|yobs) is the conditional distribution, which cannot be identified within the jth pattern. By borrowing information from the observed data in other patterns where the corresponding measurements ys [set membership] ymis (s = j + 1, …, J) are observed, it is possible to restrict fj (ymis|yobs). After introducing some proper weights (i.e. t=sJωst=1), we can identify fj (ys|y1, …, ys−1) by

fj(ys[mid ]y1,,ys1)=t=sJωstft(ys[mid ]y1,,ys1),s=j+1,,J

Using this restriction method, the full density function fj (y) can be expressed as

fj(y)=fj(yobs)[product]s=0Jj1[t=JsJωJs,tft(yJs[mid ]y1,,yJs1)]

Depending on the specification of the weights, several schemes of identification can be implemented. Setting all the weights to positive values corresponds to the identification scheme called available case missing values (ACMV [36]), which is the natural counterpart of the mechanism of MAR in the context of selection models. The restriction scheme called complete-cases missing variable (CCMV [32]) identifies fj (ys|y1, …, ys−1) by borrowing information only from the subjects who have completed the study, i.e.

fj(ys[mid ]y1,,ys1)=fJ(ys[mid ]y1,,ys1),s=j+1,,J

In this case, weights are set as ωsJ = 1 and ωss = ωs,s+1 = · · ·= ωs,J−1 = 0. Another special case of identification scheme—neighboring case missing values (NCMV)—borrows information from neighbor patterns with observed values on ys, i.e.

fj(ys[mid ]y1,,ys1)=fs(ys[mid ]y1,,ys1),s=j+1,,J

which corresponds to ωss = 1 and ωs,s+1 = ωs,s+2 = ··· = ωs,J = 0.

2.3.3. Shared-parameter model

When the dynamic transition features in longitudinal data are of interest, an appropriate analytical approach is via transition models, which use previous observations to predict current ones. Here, we propose a shared-parameter model called REMTM to deal with continuous repeated measures subject to nonignorable dropout. Similar models have been proposed to analyze binary or count measures with nonmonotone missing values [14, 19]. Within REMTM, for each subject the repeated measures (yi) are conditionally independent of the missingness indicators (ri), given the shared-parameters—random effects (ξi). Therefore, we can separately characterize the measurement process p(yi|xi, θ, ξi) and the dropout process p(ri|xi, ϕ, ξi).

To model the measurement process, an order-1 Markov chain can be assumed for yi = (yi1, …, yiJ)T, where yij is independent of (yi1, …, yi, j−2)T if the previous observation yi, j−1 is given. To capture the baseline heterogeneity across subjects, we use a random-intercept effect (ξi). Therefore, the part of REMTM characterizing the measuring process is itself a regression type of linear transition model


where ξi~iidN(0,σξ2) denotes the random intercept for subject i, εij~iidN(0,σε2) represents the residual errors as seen in standard linear regression models. To one’s most interest, β contains the fixed parameters regarding treatment efficacy in clinical trials. The parameter α sets up the link between the previous and the current unexplained measurement effects, i.e. between yi, j−1xi, j−1 β and yijxij β.

To model the dropout process, a logistic transition model with random intercepts is used. For missingness indicators ri = (ri1, …, riJ)T with rij = 0 (or 1) if yij is observed (or missing due to dropout), a first-order Markov chain is assumed with a 2×2 matrix of transition probabilities: Plk = Pr(rij = k|ri, j−1 = l) (l = 0 or 1; k =0 or 1). By the definition of dropout, we always have P10 = 0 and P11 = 1. The transition probabilities P00 and P01 are calculated as

P(rij=k[mid ]ξi,xij,ri,j1=0)={11+exp(xijη+ξiγ)ifk=0exp(xijη+ξiγ)1+exp(xijη+ξiγ)ifk=1

where parameters η calibrate the influence of covariates on the possibility of dropout. The parameter γ indicates whether the dropout process shares the random intercepts with the measurement process. Hence, it tells us whether dropout is informative.

By combining the above sub-models for measurement and dropout, we can write the full-likelihood function for parameters θ=(βT,σξ2,σε2)T and ϕ= (η, γ)T, i.e.

L(θ,ϕ)[proportional, variant][product]i=1N[{[product]j=1di1p(yij[mid ]xij,yi,j1,ξi,θ)}{[product]j=1dip(rij[mid ]xij,ri,j1,ξi,ϕ)}p(ξi)]dξi

where p(ξi) is the density function for normally distributed random intercept ξi. This model could be naturally generalized to deal with more complicated cases, e.g. incomplete data with intermittent missingness and dropout, nested random effects, or other types of shared parameters [3, 16, 4045].


In practical settings, it is common to have longitudinal data sets with both intermittent missing values and dropouts. To deal with such data sets, this section proposes a series of imputation strategies to implement the full-likelihood-based models introduced earlier.

3.1. MPI and two-stage MPI

It has been popular to do statistical analysis with the method of imputation, which fills the empty cells in a data matrix by ‘plausible’ values predicted from empirical evidence or assumption-driven models. Imputation enforces an incomplete data set into a complete one. Thus, standard complete-data modeling techniques can be applied afterward. For repeated measures, the method of imputation is especially useful since repeated measures are often highly correlated. By imputing a data set once and treating it as the actual complete data set, the uncertainty in parameter estimation is apparently underestimated. To overcome this limitation, Rubin [46] proposed the method called multiple imputation within which multiple sets of imputed values are generated for the same set of missing values.

3.1.1. Inference via MPI

As argued earlier, intermittent missingness and dropout differ not only in pattern but also in mechanism and hence should be treated in different ways. Yang and Shoptaw [20] introduced the idea of MPI by conducting imputations only for intermittent missing values so that dropouts can be isolated and treated differently. MPI potentially offers a generic solution that can be applied throughout the whole spectrum of longitudinal data analysis. By partitioning yimis into ( yiIM,yiDM) to denote intermittent missing values and dropouts, the first step of MPI is to draw m > 1 sets of imputations for the intermittent missing values: yiIM(1),yiIM(2),,yiIM(m)(i=1,,N). Then, in the second step, each partially imputed data set (Yobs, YIM(j)) (j = 1, …, m) along with X (i.e. data on covariates) is treated with selection, pattern-mixture, shared-parameter, or any other modeling strategy to deal with potentially nonignorable dropouts. Finally, in the third step, multiple versions of analysis are consolidated to derive an overall inference.

For this final step, a set of rules for consolidation was originally developed by Rubin [46] and later improved by Rubin and Schenker [47]. For the jth (j = 1, …, m) imputed data set, we denote Q(j) as the point estimate of Q (a parameter or quantity of interest) and Û (j)as the corresponding variance estimate. Then the MPI estimate (overall point estimate) of Q is Q¯=(1/m)j=1mQ^(j). The associated variance of Q is T = Ū + ((m + 1)/m)B, where U¯=(1/m)j=1mU(j) and B=(1/(m1))j=1m(Q(j)Q¯)2, respectively, represent the within-imputation variability and the between-imputation variability. For hypothesis testing, we can use the statistic (QQ)T−1/2, which has approximately t-distribution with degrees of freedom ν = (m − 1)[1 + Ū/(1 + m−1)B]2. Making proper MPI inferences requires that the multiple imputations be created ‘independently.’ In Section 3.2, we will see how to create independent imputed values via MCMC algorithms.

3.1.2. Inference via two-stage MPI

Within the second step of the above MPI inferential procedure, additional multiple imputations can be made for the dropouts when fitting a selection, pattern-mixture, or shared-parameter model. That is, for each partially imputed data set, (Yobs, YIM(j)), we draw n > 1 sets of imputations for the dropouts: yiDM(j,1),yiDM(j,2),,yiDM(j,n)(i=1,,N). For emphasis, this version of MPI with sequential imputations is called a two-stage MPI in this article. When creating imputations for dropouts, the predictive density functions p(yiDM(j,k)[mid ]yiobs,yiIM(j)) have different forms depending on the modeling strategy used (j = 1, …, m; k = 1, …, n). In Section 4.2, we will see that this method is especially useful for implementing pattern-mixture models. After the two-stage imputation, we would obtain m * n complete data sets with yi(j,k)=(yiobs,yiIM(j),yiDM(j,k))(i=1,,N;j=1,,m;k=1,,n). Each can be analyzed by traditional longitudinal models, e.g. marginal models with GEE or linear mixed-effects models, since concerns regarding missingness and dropout mechanism have been dissolved during the model-based imputation processes. Similar to MPI, the last step of the two-stage MPI is to derive the overall inference by combining the multiple analytical results. Nonetheless, Rubin’s rules presented earlier cannot be simply used by just presuming that the number of imputations is now m * n instead of m. Among all the m * n complete data sets, imputed values are far from being independent of each other, because each block ( yi(j,1),,yi(j,n)) contains identical imputed values yiIM(j). Adopting the idea of ANOVA with nested blocks, a modified set of Rubin’s rules was developed by Shen [48], which can be used for making two-stage MPI inference. Within other contexts (e.g. cross-sectional survey data with nonresponse), sequential imputation strategies are also seen; see Harel [49] and Rubin [50].

3.2. MPI via MCMC

One does not need to subscribe the Bayesian paradigm in developing a ‘proper’ imputation method so long as it satisfies a set of technical conditions [46] to guarantee frequency-valid inferences. An example of non-Bayesian imputation is seen in Section 3.4. Nonetheless, this set of conditions is useful in evaluating the properties of a given method but provides little guidance in practice to devising such a method. For this reason, a Bayesian process is often preferred. By specifying a parametric model based on the full-likelihood function and applying prior distributions to the unknown model parameters, we can simulate multiple independent draws from the conditional distribution of the missing data given the observed data using Bayes’ theorem. For the selection and REMTM models, such a conditional distribution would usually be too complicated to be directly simulated. As a solution, a collection of MCMC algorithms can be used. Within MCMC, parameters are drawn from a complicated distribution by forming a Markov chain that has this distribution as the stationary distribution. One of the most popular MCMC methods is Gibbs sampling, which simulates the conditional distribution of each component of a multivariate random variable given the other components in a cyclic manner. A series of Gibbs samplers have been developed and implemented into software packages, such as R/S-plus and SAS, to deal with various types of incomplete multivariate data [13].

3.2.1. Gibbs sampling algorithm for MPI

Conceptually, one of the Gibbs sampling algorithms dealing with multivariate normal data [13] can be modified to conduct partial imputations within MPI. By iterating the following two steps, we have the Gibbs sampling algorithm for creating imputations for intermittent missing values:

  1. I-Step: Draw values of intermittent missing data from their conditional predictive distribution, i.e. for i = 1, …, N
    yiIM(t+1)~f(yiIM[mid ]yiobs,Xi,ri,ψ)
  2. P-step: Conditioning on the drawn values for intermittent missing data (YIM(t+1)), draw parameters ψ from its posterior distribution based on partially imputed data,
    ψ(t+1)[proportional, variant]f(ψ[mid ]Yobs,YIM(t+1),X,R)

In this algorithm, ψ= (θ, ϕ) represents a vector of all parameters in a model chosen to characterize the complete data and missingness mechanism. If nonignorability is assumed for intermittent missingness, we can use a selection or REMTM model. Although detailed modeling formulizations for nonignorable intermittent missingness is not given in this article, they are conceptually natural extensions of the models given in Section 2.3. For ignorable missingness, we can choose a linear mixed-effects or a transition model. By applying Bayes’ theorem, we have f (ψ|Yobs, YIM(t+1), X, R) [proportional, variant] f (ψ) f (Yobs, YIM(t+1), X, R|ψ), where f (ψ) represents the prior distribution for ψ. Depending on the choice of modeling strategy, the conditional predictive distribution f(yiIM[mid ]yiobs,Xi,ri,ψ) has different forms.

Starting from an initial value, ψ(0) (i.e. t = 0), which can be any reasonable value obtained using an affordable way, the above I-step and P-step can be repeated with large enough number of iterations to yield a stochastic sequence {(ψ(t), YIM(t)): t = 1, …, T}. Provided that certain regularity conditions [51] hold, the empirical joint distribution of the parameter and missing values within this sequence will approach the stationary distribution f (ψ, YIM|Yobs) as T → ∞. In practice, we would monitor the convergence properties of the process [52]. If diagnostics suggest that convergence is achieved after T0 iterations, we would then retain simulated missing values every (TT0)/m iteration starting from t = T0 + 1 and treat them as multiple partial imputed values, which could be approximately viewed as being independent for large T. In this way, m > 1 sets of partial imputations are obtained, and each could be further analyzed to deal with dropouts.

3.2.2. Gibbs sampling algorithm for two-stage MPI

A two-stage MPI procedure conducts a second round of imputation regarding the dropouts within each partially imputed data set, (Yobs, YIM(j)) (j = 1, …, m). With the same flow structure as that of the sampler in Section 3.2.1, a Gibbs sampler for two-stage MPI also consists of two steps at each iteration:

  1. I-step: Missing values due to dropout are drawn from their conditional predictive function, i.e.
    yiDM(t+1)~f(yiDM[mid ]yiobs,yiIM(j),Xi,ri,ψ),i=1,,N
  2. P-step: Conditioning on the drawn values for dropouts (YDM(t+1)), draw parameters of ψ from their complete-data posterior distribution,
    ψ(t+1)[proportional, variant]f(ψ[mid ]Yobs,YIM(j),YDM(t+1),X,R)

The model used in this algorithm aims at characterizing data only subject to dropout, since the intermittent missing values have been imputed in the first stage. Therefore, the parameters ψ should be different from that in the Gibbs sampler in Section 3.2.1, although the same symbol is used for presentation. A whole procedure of the two-stage MPI requires running both the above sampler and the one in Section 3.2.1. The imputation of YDM should be nested within each partially imputed data set (Yobs, YIM(j)) (j = 1, …, m). By running this algorithm long enough, we can obtain n > 1 versions of imputation on YDM from the sampled stochastic sequence {(ψ(t), YIM(t)): t = 1, …, T} and end up with m × n sets of complete data: {(Yobs, YIM(j), YDM(j,k)): j = 1, …, m, k = 1, …, n}. Each can be analyzed using standard longitudinal models.

3.3. Implementing the selection and REMTM models within MPI

For the selection or the REMTM model, the full-likelihood function can be theoretically formulated and evaluated to make likelihood-based inferences. Unfortunately, the cost on computation is very high. When calculating the dropout probabilities in the selection model or integrating over the random effects in REMTM, time-consuming numerical solutions are demanded. Bayesian inferences based on MCMC, again, provide an affordable alternative way for fitting the models. By summarizing the sampled values on parameters, posterior distribution is naturally obtained to make exact inferences that do not count on large-sample approximation theories.

3.3.1. A Gibbs sampler for fitting a selection model with structured covariance matrix

Recently, we developed a hybrid Gibbs sampling algorithm for selection models with structured covariance matrices [22]. The algorithm for such a model with AR(1) covariance structure is presented here. For complete continuous repeated measures with multivariate normal distribution (yi ~ N(Xi β, Σ(α))) with an AR(1) structured covariance, we have cov(yij, yik) = σ2ρ|jk|. It is also clear that f(yiobs[mid ]θ)=[product]j=1di1f(yij[mid ]Hij,θ) has a marginal multivariate normal distribution with the same AR(1) covariance structure (just with lower dimension). After specifying a prior distribution f(ψ) for ψ= (θT, ϕT)T, where θ= (βT, αT)T and ϕ = ([var phi]0, [var phi]1, …, [var phi]J)T, the posterior distribution for the model parameters is

P(ψ[mid ]Y,R)[proportional, variant][[product]i=1N{f(yiobs[mid ]θ)×([product]j=1di1[1pj(yij,Hij)])×Pr(ri,di=1[mid ]Hi,di)}]×f(ψ)

This function looks fairly complicated and the integration within Pr(ri,di = 1|Hi,di) = Pr(ri,di = 1|y, Hi,di, ϕ) fi,di (y|Hi,di, θ) dy requires numerical solutions. Using a Gibbs sampler, we can alleviate the computation by first imputing the missing values at withdrawal (i.e. {yi,di: i = 1, …, N}) and then drawing parameters one by one conditionally on the observed and imputed data. More specifically, we have the following steps at each iteration:

  1. I-step: Draw missing values at withdrawal {yi,di: i = 1, …, N}:
    yi,di~fi,di(y[mid ]yiobs,xi,di,di,ψ)[proportional, variant]12πviexp{(yxi,diβμi)22vi}×exp([var phi]0+[var phi]1y+k=2di[var phi]kyi,di+1k)1+exp([var phi]0+[var phi]1y+k=2di[var phi]kyi,di+1k)
    where by regressing yi,di to yiobs, we have μi=C(di1)Tdi11(yiobsXiβ),vi=σ2(1C(di1)Tdi11C(di1)), and C(j) = (ρj−1, …, ρ)T. For AR(1) covariance structure, the inverse of the covariance matrix for yiobs (i.e. di11) has analytical expression.
  2. P-step: Draw parameters one by one in the following order:
    1. For i = 1, …, K, draw regression coefficients:
      βk~f(βk[mid ]ψ\βk,Y*,X,R)[proportional, variant][product]i=1Nexp{(yi*Xiβ)Tdi1(yi*Xiβ)2}
      where yi*=(yi1,,yi,di)T,Y*=(y1*,,yN*)T, and ψ\βk indicates all parameters except βk (‘\’ means ‘except’, similarly defined in the following).
    2. Draw covariance parameter:
      ρ~f(ρ[mid ]ψ\ρ,Y*,X,R)[proportional, variant][product]i=1N12π[mid ]di[mid ]exp{(yi*Xiβ)Tdi1(yi*Xiβ)2}
    3. Draw variance of residuals:
      σ2~f(σ2[mid ]ψ\σ2,Y*,X,R)[proportional, variant][product]i=1N12π[mid ]di[mid ]exp{(yi*Xiβ)Tdi1(yi*Xiβ)2}
    4. For k = 0, …, J, draw parameters of the dropout mechanism:
      [var phi]k~f([var phi]k[mid ]ψ\[var phi]k,Y*,X,R)[proportional, variant][product]i=1N{[[product]j=2di111+exp([var phi]0+[var phi]1yij+k=2j[var phi]kyi,j+1k)]×exp([var phi]0+[var phi]1yi,di+k=2di[var phi]kyi,di+1k)1+exp([var phi]0+[var phi]1yi,di+k=2di[var phi]kyi,di+1k)}

In presenting the above algorithm, we have assumed noninformative prior distributions for all the parameters: normal distributions with infinite variance for βk, [var phi]k, and log(σ2); and a uniform distribution for −1<ρ<1. The above algorithm is a hybrid Gibbs sampler within which other MCMC sampling methods such as Metropolis–Hastings can be applied to simulate the conditional distribution at each step. As shown by Yang and Li [53], all the conditional distributions of each sub-step (except the one for ρ) have log-concave forms and thus can be simulated using an efficient method called adaptive rejection sampling [54]. A convenience sampling method for ρ is the Metropolis–Hastings algorithm.

3.3.2. A Gibbs sampler for fitting REMTM

REMTM characterizes both the continuous measuring process and the dropout mechanism as Markov processes. Since the dropout indicators and the repeated measures are conditionally independent of each other given the shared-random effects, in the imputation step, it is sufficient to sample only the random effects (which can be viewed as a special group of ‘missing data’). Still, we use ψ= (θT, ϕT)T to denote all parameters (θ=(α,βT,σξ2,σε2)T and ϕ= (η, γ)T). A Gibbs sampler for REMTM consists of the following steps:

  1. I-step: For i = 1, …, N, draw random intercepts:
    ξi~f(ξi[mid ]yiobs,xi,ψ)[proportional, variant][product]j=2di1exp{(yij(xijβ+(yi,j1xi,j1β)α+ξi))22σε2}×[product]j=2di111+exp(xijη+ξiγ)×exp(xi,diη+ξiγ)1+exp(xi,diη+ξiγ)×exp{ξi22σξ2}
  2. P-step: Draw parameters one by one in the following order:
    1. For k = 1, …, K, draw fixed-effects regression coefficients in the measurement model:
      βk~f(βk[mid ]ψ\βk,ξ,Y*,X,R)[proportional, variant][product]i=1N[product]j=2di1exp{(yij(xijβ+(yi,j1xi,j1β)α+ξi))22σε2}
    2. Draw the parameter indicating transition from history in the measurement model:
      α~f(α[mid ]ψ\α,ξ,Yobs,X,R)[proportional, variant][product]i=1N[product]j=2di1exp{(yij(xijβ+(yi,j1xi,j1β)α+ξi))22σε2}
    3. Draw variance of residuals in the measurement model:
      σε2~f(σε2[mid ]ψ\σε2,ξ,Yobs,X,R)[proportional, variant][product]i=1N[product]j=2di1exp{(yij(xijβ+(yi,j1xi,j1β)α+ξi))22σε2}
    4. Draw variance of random intercepts:
      σξ2~f(σξ2[mid ]ψ\σξ2,Yobs,X,R)[proportional, variant][product]i=1Nexp{ξi22σξ2}
    5. For k = 1, …, K, draw the regression coefficients in modeling the dropout mechanism:
      ηk~f(ηk[mid ]ψ\ηk,ξ,Yobs,X,R)[proportional, variant][product]i=1N[[product]j=2di111+exp(xijη+ξiγ)×exp(xi,diη+ξiγ)1+exp(xi,diη+ξiγ)]
    6. Draw the parameter indicating nonignorability:
      γ~f(γ[mid ]ψ\γ,ξ,Yobs,X,R)[proportional, variant][product]i=1N[[product]j=2di111+exp(xijη+ξiγ)×exp(xi,diη+ξiγ)1+exp(xi,diη+ξiγ)]

Again, noninformative priors are used in the above algorithm [14]. It can be shown that all the above conditional distributions have log-concave forms, directly or after transformation. Thus, all conditional distributions can be simulated using the method of adaptive rejection sampling.

3.3.3. Implementing the selection and REMTM models within MPI and two-stage MPI

Within MPI, for each partially imputed data set, one of the above Gibbs samplers can be used to fit a selection or REMTM model to deal with dropouts. Simulated values of parameters can be summarized to obtain parameter estimates for each data set, which are then combined to make an MPI inference. An illustration of the strategy is seen in Sections 4.1 and 4.3.

For the two-stage MPI strategy, the above Gibbs samplers for the selection and REMTM models can be modified to create multiple imputations for dropouts. For the selection model Gibbs sampler, we only need to modify the I-step by drawing values on all the missing values after withdrawal {(yi,di, …, yiJ): i = 1, …, N} instead of only those at withdrawal (i.e. {yi,di: i = 1, …, N}). This is straightforward because of the congeniality feature of the multivariate normal distribution, i.e. conditional distribution of (yi,di, …, yiJ)T given yiobs is also normal. For the REMTM Gibbs sampler, we need to add a sub-step within the I-Step to draw missing values given the current drawn random effects and the previous drawn parameters, that is, yiDM~f(yiDM[mid ]ψ,ξ,yiobs,yiIM(j),Xi,di). Similarly, this conditional distribution is multivariate normal with parameters derived by regressing yiDM on ( yiobs,yiIM(j)) (see I-step in Section 3.3.1), where yiIM(j) refers to the jth imputation for the intermittent missing values (j = 1, …, n). Running this modified Gibbs sampler for the selection or REMTM model, we can generate m × n complete data sets to be analyzed using a standard longitudinal model.

3.4. Implementing pattern-mixture models within MPI

As seen in Section 2.3.2, there are at least three schemes in identifying restrictions for the parameters in pattern-mixture models: CCMV, NCMV, and ACMV. MPI or two-stage MPI provides a convenient framework for implementing these identification schemes. Assuming that the intermittent missing values have been imputed multiple times in a previous round, now we use the pattern-mixture model to create imputations for the dropouts without employing the Bayesian paradigm.

First, we fit a model to the pattern-specific identifiable densities: fj (y1, …, yj) (j = 2, …, J indicating the observed dropout times in a data set) and obtain maximum likelihood estimates [theta w/ hat](j). Second, we select an identification scheme to determine the conditional distributions of the unobserved measurements, given the observed ones: fdi(yis|yi1, …, yi,di − 1) (i = 1, …, N; s = di, …, J). As seen from Section 2.3.2, each of such conditional distributions is a mixture of known normal densities for continuous repeated measures. An easy way to simulate values from the mixture distribution is to randomly select a component of the mixture (according to the weights ωst’s defined in Section 2.3.2) and then draw from it. For details of implementation and examples, see Thijs et al. [31] and Section 4.2.


In this section, we use the carbon monoxide data to illustrate the above imputation-based strategies. To deal with intermittent missing values, ignorability was assumed when generating MPIs. Then, for each partially imputed data set, we dealt with the possibly nonignorable dropouts using the selection, pattern-mixture, and REMTM models.

4.1. Application of the selection model with AR(1) covariance structure

Using a piecewise linear mixed-effects model with ignorability assumption on missingness, Shoptaw et al. [21] reported a significant treatment effect of CM. Here, we reanalyzed a subset of the data starting from the second week using the strategy of MPI. After taking the logarithmic transformation, repeated carbon monoxide levels for each participant were viewed as multivariate normally distributed (i.e. yi ~ N(μ, Σ)). Specifying a normal prior distribution for the mean vector (i.e. μ|Σ~N(μ0, τ−1Σ)) and an inverted Wishart distribution for the covariance matrix (i.e. Σ~ W−1(r, Λ)), we began creating MPIs using the SAS procedure called PROC MI with the option of monotone missingness [55]. This procedure adopts the Gibbs sampling algorithm similar to the one described in Section 3.2.1. Since no prior information was available, Jeffery’s invariance principle was used to derive the noninformative form for the normal-inverse-Wishart prior distribution. For details of implementation, please refer to Chapter 6 of Schafer [13]. The expectation maximization (EM) algorithm was first run to obtain the maximum likelihood estimates (i.e. [mu], [Sigma]), which were set as the starting point to initiate the MPI Gibbs sampler. Various diagnostic tools suggested that the procedure converged within 200 iterations. By simulated one chain of parameters and missing values with a total of T = 2200 iterations and setting the first T0 = 200 iterations as the burn-in period, four sets of imputations on intermittent missing values were obtained with an interval of 500 iterations.

For each of the partially imputed data set, we applied the selection model to analyze the carbon monoxide levels after transformation. As seen from Figure 1, the mean carbon monoxide levels decline quickly within the first week approximately from the same starting levels and then remain leveling off at different levels throughout the rest of the study period. For each partially imputed data set, we used PROC MIXED in SAS to fit linear mixed models with various predictors and covariance structures. By model comparison with AIC, the following mean structure with AR(1) covariance was supported by all the four data sets. Thus, the AR(1) selection model was used with the following mean structure for characterizing the carbon monoxide levels:


where CMi and RPi, respectively, indicate whether the ith smoker received CM or RP, BaseCOi indicates baseline carbon monoxide level, and Patchesi represents the number of nicotine patches the smoker received during the study. To model the dropout process, the following logistic regression model was selected after sequential model comparisons:

logit(pdi(yidi,Hij))=[var phi]0+[var phi]1Yi,di+[var phi]2Yi,di1

where di indicates the dropout time of the ith smoker (i = 1, …, 174).

By running the Gibbs sampler with noninformative priors for the AR(1) selection model, we obtained the estimates of all the parameters for each partially imputed data set. Only interesting parameters are listed in Table I from which we see that the between-imputation variance is very small for each parameter. In other words, the fraction of missing information due to intermittent missingness is low. After consolidating the four sets of estimates using Rubin’s rules, it is clearly seen that the treatment effect of CM is significant ([beta]1 = −0.28; T2490 = −5.88 with p<0.0001). RP turns out to be ineffective and there is no significant interaction effect between CM and RP. The regression coefficient [var phi]1 is significantly larger than zero ([var phiv with circumflex]1 = 1.28; T2024 = 3.86 with p = 0.0002), suggesting that the higher the underlying missing value, the larger the probability of dropping out. In other words, the dropouts are possibly outcome-dependent nonignorable.

Table I
Estimates treatment effects and parameters of the dropout model for the four partially imputed carbon monoxide data sets.

4.2. Application of pattern-mixture models and two-stage MPI

For the purpose of demonstrating pattern-mixture models, only the efficacy of CM is investigated in the following analyses. We first clustered participants into two groups: completers (n1 = 112) and early terminators (n2 = 62). Then within each group, the efficacy of CM was investigated. As seen from Figure 3, CM seems to be less effective for the early terminators. A linear mixed model with AR(1) covariance structure was selected (with predictors CMi, BaseCOi, and Patchesi) for analyzing the carbon monoxide levels starting from the second week,

Figure 3
Mean carbon monoxide levels for completers and early terminators. By dividing the 174 smokers into two groups: completers (n1 = 112) and early terminators (n1 = 62), the mean curves of carbon monoxide levels for subjects receiving CM (contingency management) ...

This model was applied separately to the completers and the early terminators. Let β^1c and β^1w denote the point estimators of β1, respectively, for the completers and the early terminators, and [pi]c = 64 per cent denote the estimated probability of being complete, then the overall pointer estimator is the weighted average, β^1=π^cβ^1c+(1π^c)β^1w, with variance derived using the delta method [56].

Since the fraction of missing information due to intermittent missing was low, only three sets of imputations for intermittent missing values were created this time. When conducting partial imputation, the procedure described in Section 4.1 was applied. The pattern-averaged point estimators and standard errors for β1 are listed in Table II. After consolidating, the overall point estimate is β^¯1=0.25 with standard deviation var(β^¯)=0.13. The test based on the t-statistic gives a p-value of 0.06.

Table II
Estimated treatment effect of contingency management ([beta]1 (SD)) using the pattern-mixture model with two patterns (complete versus dropout).

In the above preliminary analysis, a simple pattern-mixture modeling strategy with only two target dropout patterns (complete or incomplete) was used within the framework of MPI. In the following, we describe the application of restriction identification strategies within the framework of the two-stage MPI for pattern-mixture models with larger number of dropout patterns.

When the number of target dropout patterns becomes large, the application of pattern-mixture models without imputation becomes less attractive. For example, the mean profiles of carbon monoxide levels across five dropout patterns are plotted in Plate 1, from which we observe notable within-pattern and across-pattern variances regarding the trajectory of the carbon monoxide levels. As the number of patterns increases, the number of subjects within each pattern becomes smaller, and it becomes tedious (even infeasible) to conduct pattern-specific analysis and then combine the results across patterns as we did above.

Adopting the procedure described in Section 3.4, three restriction schemes (CCMV, NCMV, and ACMV) were used to make multiple imputations for the dropouts. Within this two-stage MPI, the numbers of partial imputations were set as m = 2 for the imputation of intermittent missing values and n = 3 for the imputation of dropouts. Again, the Gibbs sampling process with multivariate normal assumption was used for imputing intermittent missing values (see Section 4.1). Hence, we ended up with totally six complete data sets. Then, each complete data set was analyzed using a linear mixed model with AR(1) covariance structure with predictors CM, BaseCO, and Patches. Using the consolidation procedure as described in Section 3.2.2, the final point estimates and fractions of missing information for the treatment effect of CM are shown in Table III. The p-values of a one-sided hypothesis test using the t-statistics are also listed. From these results, we can see that the fraction of missing information due to dropout is much higher than that due to intermittent missingness. Two out of three identification strategies strongly support the favorable treatment efficacy of CM.

Table III
Estimated treatment effect of contingency management using the pattern-mixture models within the framework of two-stage MPI.

4.3. Application of the REMTM

We reanalyzed the same subset of the four groups of carbon monoxide levels using the REMTM model as we did using the selection model. The carbon monoxide data after dichotomization were analyzed by Yang et al. [11] using an REMTM for binary repeated measures subject to intermittent missingness and dropout. Here, the continuous data were analyzed using the hybrid Gibbs sampler for REMTM with predictors: CMi, RPi, and RP*CMi. Still with the partially imputed data sets created in Section 4.1, Table IV depicts the combined estimated posterior means, standard deviations, and 95 per cent credible intervals (CI) for all parameters of the model.

Table IV
Posterior parameter estimation with standard deviation and 95 per cent credible intervals using the REMTM to the continuous carbon monoxide data.

The estimated parameters for σξ2 and γ jointly suggest that dropout is random-effects dependent, hence nonignorable. The introduced random-intercept effects (i.e. ξi’s) capture the heterogeneity on dropout and carbon monoxide levels across the subjects. Among all the estimated parameters of η, no one is significantly different from zero. The significantly positive value of estimated α suggests that the current carbon monoxide level of an individual positively depends on the previous one. This is reasonable and consistent with the result supported by the selection model. It is of most interest that the fitted REMTM confirmed a strongly favorable treatment efficacy of CM in reducing the levels of carbon monoxide (i.e. [beta]2 = −0.25 with 95 per cent CI= (−0.46, −0.05)). This result is also consistent with that of the selection model, where [beta]2 = −0.28.


This paper introduces alternative imputation-based strategies for implementing longitudinal models with full-likelihood functions in dealing with intermittent missing values and dropouts that are potentially nonignorable. Using the carbon monoxide data set from a smoking cessation clinical trial, we have demonstrated the application of MPI and two-stage MPI to implement selection, pattern-mixture, and shared-random-effects models. We emphasize that the framework of MPI or two-stage MPI provides a very flexible solution for incomplete longitudinal data analysis. When drawing imputation on the intermittent missing values, various modeling options can be employed, depending on the assumption on the missingness mechanism (e.g. MCAR, MAR, or nonignorable). Although the formulation of selection, pattern-mixture, and shared-parameter models is only presented for repeated measures subject to dropout, it is conceivable that similar ideas be developed for intermittent missingness. When handling dropouts, we can use another group of advanced modeling options. The models used for the two types of missingness can be totally different. For example, a selection model can be used for imputing intermittent missing values while a pattern-mixture model is used for imputing dropouts. It is also possible that imputation of intermittent missing values and analysis on dropouts are conducted by different persons at different places and times.

Another advantage of imputation-based strategies regards sensitivity analysis. As discussed earlier, a notable limitation with incomplete data analysis is that the true model and mechanism for measurements and missing values (including dropouts) are usually unverifiable in practical settings. When making MPI inferences, as mentioned above, various combinations of modeling schemes can be implemented with various assumptions. Therefore, MPI provides a useful tool in studying the sensitivity of model-based analytical conclusions. For one data set, if different MPI modeling strategies end up with inconsistent conclusions regarding the efficacy of the same treatment or intervention in a study, then further investigation should be conducted. We should try to avoid reporting with confidence a strong conclusion based only on one modeling option, while other options suggest controversial results. For the same set of data from the smoking trial, we applied various models to analyze the treatment efficacy of two behavioral therapies: CM and RP. Overall results depict a consistent image in supporting the favorable efficacy of CM.

It should also be noted that selection, pattern-mixture, and shared-parameter models are generalized versions of standard longitudinal models (i.e. marginal models using GEE, linear mixed-effects models, and transition models). For example, the linear mixed-effects model ignoring missing values can be viewed as a selection model with the MAR assumption. Though only continuous repeated measures are targeted in this article, the modeling strategies based on the full-likelihood function can be extended for other formats of repeated measures.

When describing the various Gibbs sampling algorithms for model fitting or multiple imputation, we tried to present the technical implementation as detailed as we can, but a full description for all possible modeling techniques is not possible. Most technical details are seen in the user manual and the technical report of the MPI software package. When specifying the prior distribution for the parameters, we adopted the choice of noninformative priors because there were no historical data or empirical evidence that could guide us in eliciting proper prior options. As raised by an anonymous reviewer, it is possible that flat priors are problematic in practice especially for those parameters related to missingness or dropout mechanism. This is notable when there are very few missing values, providing very little information for estimating the missingness-related parameters. For our carbon monoxide data set, fortunately, this was not the issue. When monitoring the convergence of the Gibbs samplers, we basically tried two main approaches. The first one was to generate two or more Markov chains starting from different initial values and wait until they interweaved with each other and the between-chain variance was much smaller than the within-chain variance. The second approach was to work with one chain, retaining only every kth sample after the burn-in period, with k set as a large enough number (e.g. 10) such that the retained samples are approximately independent. Usually, we stopped simulation procedure when the quantiles of all or some selected parameters were stable.

Figure 4
Plate 1. Pattern-dependent distribution of carbon monoxide levels. Using the software package named ‘MPI 2.0’, profiles and mean curves of carbon monoxide levels are drawn within each of the five groups determined by the dropout times: ...


This work was supported by the National Institute of Drug Abuse through an SBIR contract N44 DA35513 and three research grants: R03 DA016721, R01 DA09992, and P50 DA18185. We especially thank Hamutahl Cohen for her editorial assistance and the reviewers for their constructive comments.


1. Nich C, Carroll KM. ‘Intention-to-treat’ meets ‘missing data’: implications of alternate strategies for analyzing clinical trials data. Drug and Alcohol Dependence. 2002;68:121–130. [PubMed]
2. Hedeker D, Gibbons RD. A random effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933–944. [PubMed]
3. Follmann D, Wu M. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed]
4. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
5. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed]
6. Diggle P, Heagerty P, Liang K-Y, Zeger S. Analysis of Longitudinal Data. 2. Oxford University Press; Oxford: 2002.
7. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2. Wiley; New York: 2002.
8. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–582.
9. Robins JM, Rotnitzky AG, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–120.
10. Diggle PJ, Kenward MG. Informative drop-out in longitudinal data analysis. Applied Statistics. 1994;43(1):49–93.
11. Yang X, Nie K, Belin T, Liu J, Shoptaw S. Markov transition models for binary repeated measures with ignorable and nonignorable missing values. Statistical Methods in Medical Research. 2007;16(4):347–364. [PubMed]
12. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; New York: 2000.
13. Schafer JL. Analysis of Incomplete Multivariate Data. Chapman & Hall; London: 1997.
14. Li J, Yang X, Wu Y, Shoptaw S. A random-effects Markov transition model for Poisson-distributed repeated measures with nonignorable missing values. Statistics in Medicine. 2007;26(12):2519–2532. [PubMed]
15. Molenberghs G, Michiels B, Lipsitz SR. Selection models and pattern-mixture models for incomplete categorical data with covariates. Biometrics. 1999;55:978–983. [PubMed]
16. Albert PS, Follmann DA. Modeling longitudinal count data subject to informative dropout. Biometrics. 2000;56:667–677. [PubMed]
17. Little RJA. Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association. 1995;90:1112–1121.
18. Troxel AB, Harrington DP, Lipsitz SR. Analysis of longitudinal data with non-ignorable non-monotone missing values. Applied Statistics. 1998;47:425–438.
19. Albert PS, Follmann DA. A random effects transition model for longitudinal binary data with informative missingness. Statistica Neerlandica. 2003;57:100–111.
20. Yang X, Shoptaw S. Assessing missing data assumptions in longitudinal studies: an example using a smoking cessation trial. Drug and Alcohol Dependence. 2005;77:213–225. [PubMed]
21. Shoptaw S, Rotheram-Fuller E, Yang X, Frosch D, Nahom D, Jarvik ME, Rawson RA, Ling W. Smoking cessation in methadone maintenance. Addiction. 2002;97:1317–1328. [PubMed]
22. Jennrich RI, Schluchter MD. Unbalanced repeated measures model with structural covariance matrices. Biometrics. 1986;42:805–820. [PubMed]
23. Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997;16:239–258. [PubMed]
24. Murray GD, Findlay JG. Correcting for the bias caused by drop-outs in hypertension trials. Statistics in Medicine. 1988;7(9):941–946. [PubMed]
25. Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement. 1976;5:475–492.
26. Fitzmaurice GM, Molenberghs G, Lipsitz SR. Regression models for longitudinal binary responses with informative dropouts. Journal of the Royal Statistical Society: Series B. 1995;57:691–704.
27. Molenberghs G, Kenward MG, Lesaffre E. The analysis of longitudinal ordinal data with non-random dropout. Biometrika. 1997;84:33–44.
28. Nordheim EV. Inference from nonrandomly missing categorical data: an example from a genetic study on Turner’s syndrome. Journal of the American Statistical Association. 1984;79:772–780.
29. Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal studies data. Statistical Methods in Medical Research. 1999;8:51–83. [PubMed]
30. Glynn RJ, Laird NM, Rubin DB. Selection modeling versus mixture modeling with non-ignorable nonresponse. In: Wainer H, editor. Drawing Inferences from Self Selected Samples. Springer; New York: 1986. pp. 115–142.
31. Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3-2:245–265. [PubMed]
32. Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134.
33. Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483.
34. Ekholm A, Skinner C. The Muscatine children’s obesity data reanalysed using pattern mixture models. Applied Statistics. 1998;47:251–263.
35. Hogan JW, Laird NM. Intent-to-treat analysis for incomplete repeated measures. Biometrics. 1996;52:1002–1007. [PubMed]
36. Molenberghs G, Michiels B, Kenward MG, Diggle PJ. Missing data mechanisms and pattern-mixture models. Statistica Neerlandica. 1998;52:153–161.
37. Michiels B, Molenberghs G, Lispsitz SR. A pattern-mixture odds ratio model for incomplete categorical data. Communication in Statistics: Theory and Methods. 1999;28(12):2863–2870.
38. Birmingham J, Fitzmaurice GM. A pattern-mixture model for longitudinal binary responses with nonignorable nonresponse. Biometrics. 2002;58(4):989–996. [PubMed]
39. Birmingham J, Rotnitzky A, Fitzmaurice GM. Patternmixture and selection models for analysing longitudinal data with monotone missing patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65(1):275–297.
40. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188.
41. Wu MC, Bailey KR. Estimation and comparison of changes in the pressure of informative right censoring: conditional linear model. Biometrics. 1989;45:939–955. [PubMed]
42. Wu MC, Follmann DA. Use of summary measures to adjust for informative missingness in repeated measures data with random effects. Biometrics. 1999;55:75–84. [PubMed]
43. Albert PS. A transitional model for longitudinal binary data subject to nonignorable missing data. Biometrics. 2000;56:602–608. [PubMed]
44. Pulksteinis EP, Ten Have TR, Landis R. Model for the analysis of binary longitudinal pain data subject to informative dropout through remedication. Journal of the American Statistical Association. 1998;93:438–450.
45. Ten Have TR, Kunselman AR, Pulksteinis EP, Landis R. Mixed effects logistic regression models for longitudinal binary repeated response data with informative drop-out. Biometrics. 1998;54:367–383. [PubMed]
46. Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley; New York: 1987.
47. Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986;81:366–374.
48. Shen ZJ. PhD Dissertation. Department of Statistics, Harvard University; Cambridge, MA: 2000. Nested multiple imputation.
49. Harel O. PhD Dissertation. Department of Statistics, Pennsylvania State University; University Park, PA: 2003. Strategies for data analysis with two types of missing values.
50. Rubin DB. Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica. 2003;57:3–18.
51. Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Chapman & Hall; London: 1996.
52. Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association. 1996;91:883–904.
53. Yang X, Li J. A hybrid Gibbs sampler for selection models in dealing with outcome-dependent nonignorable dropouts. UCLA Statistics Electronic Publications. 2005 Preprint 451.
54. Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Applied Statistics. 1992;41:337–348.
55. SAS Institute Inc. SAS/STAT* Software Changes and Enhancements, Release 8.2. SAS Institute Inc; 2001.
56. Hedeker D, Gibbons RD. Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods. 1997;2:64–78.