Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Test (Madr). Author manuscript; available in PMC 2011 September 22.
Published in final edited form as:
PMCID: PMC3178337

Discussion of “Missing Data Methods in Longitudinal Studies: A Review” by Ibrahim and Molenberghs

First, we would like to thank Joe and Geert for a carefully written review paper on longitudinal data. We would like to expand on several points discussed in this paper. Specifically, we would like to expand on 1) the interpretation of covariate effects and use of identifying restrictions with covariates in mixture models (Section 4.2.2) and 2) issues with sensitivity analyses in parametric models for the full fix wording and in selection models in general (Section 4.2.3).

1 Mixture Models

In the following, we focus on the setting of covariates that are collected at baseline with no missingness.

1.1 Interpretation of covariate effects

In longitudinal studies, as discussed here, the main focus of inference is usually on the marginal distribution, p(y). In mixture models, the full-data model p(y|x) is a mixture of component distributions with regard to different missing patterns r, i.e.


Similarly, E[Y|x] is


So, assessing the covariate effects on the marginal mean has to be done by averaging over patterns and needs to consider (1) whether the mean is linear in covariates; (2) whether marginal distribution of missingness depends on covariates, and (3) whether covariates effects are time-invariant. In this discussion, we will focus on the first issue.

For mixture models with an identity link, averaged covariates effects for the full-data distribution have a simple form as a weighted average over pattern-specific covariate effects and have a straightforward interpretation (Fitzmaurice et al., 2001). As an example, consider the full-data response Y = (Y1, …, Yn)′ are to be observed at time points {t1, …, tn} and denote the baseline covariates by X. Assume drop out is monotone and independent of X and let S be the dropout time with ϕs = P(S = ts) for s = 1, …, n and s=1nϕs=1.

When the link function, denoted by g, is non-linear, and the within-pattern s (S = ts) mean model is


and we have in general


So it can be difficult to capture the covariate effects compactly (Fitzmaurice et al., 2001; Wilkins and Fitzmaurice, 2006). Roy and Daniels (2008) proposed to specify marginalized models and impose constraints on the conditional mean. This is in the spirit of earlier work by Azzalini (1994) and Heagerty (1999). A simple version of the model in Roy and Daniels is illustrated below.

First, the marginal mean is specified as


Second, a conditional model is specified to account for within-subject correlation and dependencies between the response and missingness pattern. We assume Yij, conditional on random effects bi and missingness pattern Si, are from exponential family and have distribution




The conditional model has to be compatible with the marginal model. In particular, the intercepts Δij are determined by the relationship


and are functions of other parameters including β in the model. Note that this is marginalized over both missingness patterns and subject-specific random effects. Serial correlation within pattern can be addressed by augmenting the conditional model with a Markov components (Heagerty, 2002).

1.2 Identifying restrictions with covariates

Identifying restrictions can be problematic in pattern mixture models with baseline covariates with time-invariant coefficients. We will focus on the available case missing value (ACMV) restriction (Little, 1993; Molenberghs et al., 1998) here which corresponds to MAR. Missing at random (MAR) is often taken as a starting point for analysis of incomplete data (Troxel et al., 2004; Zhang and Heitjan, 2006).

To illustrate, consider Y = (Y1, Y2) being a bivariate normal response with missing data only in Y2. The missing data indicator R equals 1 or 0 corresponding to Y2 being observed or missing. Assume

R~Bern(ϕ) and Y|R=r~N(μ(r),Σ(r))


μ(r)=[μ1(r)μ2(r)] and Σ(r)=[σ11(r)σ12(r)σ12(r)σ22(r)]

for r = 0, 1. For the bivariate case, the ACMV restriction is


where [similar, equals] denotes the equality in distribution. This requires that for all Y1,


which in turn implies that


This restriction identified the full data response distribution.

When there are baseline covariates with time-invariant coefficients, we have that

μ(r)=[μ1(r)+Xβ(r)μ2(r)+Xβ(r)]    and    Σ(r)=[σ11(r)σ12(r)σ12(r)σ22(r)]

for r = 0, 1, where x does not contain an intercept.

The MAR assumption requires that for all X and Y1,


By simple algebra, we can see that this restricts β(0) to be equal to β(1). Note that both β(0) and β(1) are identified from the observed data. Therefore, the ACMV restricton/MAR assumption causes over-identification and has impact on the model fit to the observed data. This is against the principle of applying identifying restrictions (Little, 1994). Ways to remedy this (and associated problems) are explored in Wang and Daniels (working paper).

2 Issues with Sensitivity Analysis

Sensitivity analysis is critical in longitudinal analysis of incomplete data with informative drop-out as stated in this paper. In the setting of missing data, the full-data model can be factored into an extrapolation model and an observed data model,


where ωE are parameters indexing the extrapolation model and ωI are parameters indexing the observed data model and are identifiable from observed data (Daniels and Hogan, 2008). Full-data model inference requires unverifiable assumptions about the extrapolation model p(ymis|yobs, r, ωE). A sensitivity analysis explores the sensitivity of inferences of interest about the full data response model to unverifiable assumptions about the extrapolation model. This is typically done by varying sensitivity parameter, which we define next. Suppose there exists a reparameterization ξ(ω) = (ξS,ξM) such that (1) ξs is a non-constant function of ωE, (2) the observed likelihood L(ξS; ξM|yobs, r) is a constant as a function of ξS and (3) given ξS fixed, L(ξS, ξM|yobs, r) is a non-constant function of ξM. A parameter ξS that satisfies these three conditionals is a sensitivity parameter and can be used for sensitivity analysis and/or for incorporation of prior information (Daniels and Hogan, 2008).

2.1 In parametric models

Unfortunately, fully parametric selection models and shared parameter models do not allow sensitivity analysis as sensitivity parameters cannot be found (Daniels and Hogan, Chapter 8, 2008). Examining sensitivity to distributional assumptions, e.g., random effects, will provide different fits to the observed data, (yobs, r). In such cases, a sensitivity analysis cannot be done since varying the distributional assumptions does not provide equivalent fits to the observed data (Daniels and Hogan, 2008). It then becomes a problem of model selection. Next, we provide an example of the inability to find sensitivity parameters in a simple parametric selection model for binary data.

As an example, consider the situation when Y = (Y1, Y2) is a bivariate binary response with missing data only in Y2. Let R = 1 if Y2 is observed and R = 0 otherwise.

Let ωy1,y2(r) be P(Y1 = y1, Y2 = y2, R = r) and ωy1+(0) be P(Y1 = y1, R = 0). A multinomial parameterization of the full-data model of Y and R is shown in Table 1.

Table 1
A multinomial parameterization full-data model for Y

In this example, the set of parameters


are identified by observed data without any modeling assumption. When a selection model is fully parametric, all its parameters can be identified by the observed data. To see this, we specify a parametric model for the bivariate binary example:

logit P(Y1=1)=β0logit P(Y2=1|Y1)=β0+β1Y1logit P(R2=1|Y1,Y2)=ϕ0+τY2.

Note that we assume




We will show that the full-data model is identified under the parametric assumptions by showing all parameters, (β0, β1, ϕ0, τ) can be written as a function of ωI, the identified ω’s.

First, note that

β0=logit P(Y1=1)=logit (ω10(1)+ω11(1)+ω1+(0)).

Also, by (1),

β0=logit P(Y2=1|Y1=0)=logitω01(1)+ω01(0)ω00(1)+ω01(1)+ω0+(0).

This gives

ω01(1)=(ω10(1)+ω11(1)+ω1+(1))(ω00(1)+ω01(1)+ω0+(0))ω01(1)   and   ω00(0)=ω0+(0)ω01(0).

As a consequence, since τ has the interpretation that

τ=log {P(R2=1,Y2=1,Y1=0)P(R2=0,Y2=1,Y1=0)/P(R2=1,Y2=0,Y1=0)P(R2=0,Y2=0,Y1=0)},

thus it is identified by


where ω00(0) and ω01(0) are identified by (3).

Further, since τ can also be expressed as


hence we have that ω11(0) and ω10(0) are identified as

ω11(0)=ω1+(0)11+ω00(0)ω01(1)ω10(1)ω01(0)ω00(1)ω11(1)    and    ω10(0)=ω1+(0)ω11(0).

Therefore, in this parametric selection model, the parameters ω00(0),ω01(0),ω10(0) and ω11(0) are all identified (as opposed to their sums, ω0+(0) and ω1+(0)).

Finally, we can show that

β1=logit P(Y2=1|Y1=1)β0=logitω11(1)+ω11(0)ω11(1)+ω10(1)+ω1+(0)β0


ϕ0=logit P(R2=1|Y2=0)=logitω00(1)+ω10(1)ω00(0)+ω10(0)+ω00(1)+ω10(1).

2.2 In Bayesian semiparametric selection models

The factorization of a selection model provides a transparent way to understand the missing data mechanism. In Bayesian selection models, an intuitive prior specification assumes independence between the parameters of the missing data mechanism (ϕ) and the full data response (β)(Scharfstein et al., 2003).

However, in a Bayesian model under this prior specification, sensitivity parameters in a selection model, denoted by τ, can be (weakly) identified by the observed data, i.e. p(τ|yobs, r) ≠ pτ (τ), even though the observed data likelihood contains no information about the sensitivity parameters (Daniels and Hogan, 2008). We outline how this occurs in the following.

In general, a semi-parametric selection model might specify the full data response distribution nonparametrically (or saturated if a categorical response), p(y; β) with a missing data mechanism given as as follows:

logit P(Rj=1|Rj1=0,Y)=hj(Y¯j1;ϕ)+qj(Y;τ)

for j = 1, …, J, where hj is an arbitrary smooth function, and qj is a user specified function that encodes assumptions about how the MDM depends on missing data and it parameters are sensitivity parameters. Note that qj(Y) = 0 implies MAR and qj(Y) = qj(Yj) implies non-future dependence.

To see the cause of the weak identification, let θ = {ϕ, β} and ωI be the identified parameters. By re-parameterizing the model, we may find a mapping, indexed by τ, between θ and ωI,


Due to the mapping, even a priori independence between τ and θ will yield a priori dependence between τ and ωI, since


The Jacobian introduces the dependence.

The posterior for the sensitivity parameters τ can be expressed as


Thus from (4), p(τ|yobs, r) ≠ pτ (τ).

As a concrete example, consider a bivariate binary response with missing data only in Y2 from the previous section. A saturated selection model can be specified as

logit P(Y1=1)=β0logit P(Y2=1|Y1)=β1+β2Y1logit (R2=1|Y1,Y2)=ϕ0+ϕ1Y1+τY2

and θ = {β0, β1, β2, ϕ0, ϕ1}. MAR holds when τ = 0. Note τ is not identified by the observed data. It can be shown that for any Δτ, there exists Δθ, such that


i.e (τ, θ) and (τ + Δτ, θ + Δθ) will yield the same law of observed data.

Let θ* = {eα0, eα1, eα2, eϕ0, eϕ1} and τ* = eτ. We can derive that


The a priori dependence of p(ωI|τ) is thus introduced by |dθ*dωI|. This has been pointed out in Scharfstein et al. (2003) and explored further in Wang et al. (working paper).


  • Azzalini A. Logistic regression for autocorrelated data with application to repeated measures. Biometrika. 1994;81(4):767–775.
  • Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC; 2008.
  • Fitzmaurice GM, Laird NM, Shneyer L. An Alternative Parameterization of the General Linear Mixture Model for Longitudinal Data with Non-ignorable Drop-outs. Statistics in Medicine. 2001;20(7):1009–1021. [PubMed]
  • Heagerty PJ. Marginally Specified Logistic-Normal Models for Longitudinal Binary Data. Biometrics. 1999;55(3):688–698. [PubMed]
  • Heagerty PJ. Marginalized Transition Models and Likelihood Inference for Longitudinal Categorical Data. Biometrics. 2002;58(2):342–351. [PubMed]
  • Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88(421):125–134.
  • Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81(3):471–483.
  • Molenberghs G, Michiels B, Kenward MG, Diggle PJ. Monotone missing data and pattern-mixture models. Statistica Neerlandica. 1998;52(2):153–161.
  • Roy J, Daniels MJ. A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics. 2008;64:538–545. [PMC free article] [PubMed]
  • Scharfstein DO, Daniels MJ, Robins JM. Incorporating prior beliefs about selection bias into the analysis of randomized trials with missing outcomes. Biostatistics. 2003;4(4):495. [PMC free article] [PubMed]
  • Troxel AB, Ma G, Heitjan DF. An Index of Local Sensitivity to Nonignorability. Statistica Sinica. 2004;14(4):1221–1238.
  • Wang C, Daniels MJ. A note on identifying restriction in normal mixture models with and without covariates for incomplete data. working paper. [PMC free article] [PubMed]
  • Wang C, Daniels MJ, Scharfstein DO. Bayesian semiparametric selection model with application to a breast cancer prevention trial. working paper.
  • Wilkins KJ, Fitzmaurice GM. A Hybrid Model for Nonignorable Dropout in Longitudinal Binary Responses. Biometrics. 2006;62(1):168–176. [PubMed]
  • Zhang J, Heitjan DF. A Simple Local Sensitivity Analysis Tool for Nonignorable Coarsening: Application to Dependent Censoring. Biometrics. 2006;62(4):1260–1268. [PubMed]