PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Test (Madr). Author manuscript; available in PMC 2011 September 22.
Published in final edited form as:
PMCID: PMC3178337
NIHMSID: NIHMS192624

Discussion of “Missing Data Methods in Longitudinal Studies: A Review” by Ibrahim and Molenberghs

First, we would like to thank Joe and Geert for a carefully written review paper on longitudinal data. We would like to expand on several points discussed in this paper. Specifically, we would like to expand on 1) the interpretation of covariate effects and use of identifying restrictions with covariates in mixture models (Section 4.2.2) and 2) issues with sensitivity analyses in parametric models for the full fix wording and in selection models in general (Section 4.2.3).

1 Mixture Models

In the following, we focus on the setting of covariates that are collected at baseline with no missingness.

1.1 Interpretation of covariate effects

In longitudinal studies, as discussed here, the main focus of inference is usually on the marginal distribution, p(y). In mixture models, the full-data model p(y|x) is a mixture of component distributions with regard to different missing patterns r, i.e.

equation M1

Similarly, E[Y|x] is

equation M2

So, assessing the covariate effects on the marginal mean has to be done by averaging over patterns and needs to consider (1) whether the mean is linear in covariates; (2) whether marginal distribution of missingness depends on covariates, and (3) whether covariates effects are time-invariant. In this discussion, we will focus on the first issue.

For mixture models with an identity link, averaged covariates effects for the full-data distribution have a simple form as a weighted average over pattern-specific covariate effects and have a straightforward interpretation (Fitzmaurice et al., 2001). As an example, consider the full-data response Y = (Y1, …, Yn)′ are to be observed at time points {t1, …, tn} and denote the baseline covariates by X. Assume drop out is monotone and independent of X and let S be the dropout time with ϕs = P(S = ts) for s = 1, …, n and equation M3.

When the link function, denoted by g, is non-linear, and the within-pattern s (S = ts) mean model is

equation M4

and we have in general

equation M5

So it can be difficult to capture the covariate effects compactly (Fitzmaurice et al., 2001; Wilkins and Fitzmaurice, 2006). Roy and Daniels (2008) proposed to specify marginalized models and impose constraints on the conditional mean. This is in the spirit of earlier work by Azzalini (1994) and Heagerty (1999). A simple version of the model in Roy and Daniels is illustrated below.

First, the marginal mean is specified as

equation M6

Second, a conditional model is specified to account for within-subject correlation and dependencies between the response and missingness pattern. We assume Yij, conditional on random effects bi and missingness pattern Si, are from exponential family and have distribution

equation M7

where

equation M8

The conditional model has to be compatible with the marginal model. In particular, the intercepts Δij are determined by the relationship

equation M9

and are functions of other parameters including β in the model. Note that this is marginalized over both missingness patterns and subject-specific random effects. Serial correlation within pattern can be addressed by augmenting the conditional model with a Markov components (Heagerty, 2002).

1.2 Identifying restrictions with covariates

Identifying restrictions can be problematic in pattern mixture models with baseline covariates with time-invariant coefficients. We will focus on the available case missing value (ACMV) restriction (Little, 1993; Molenberghs et al., 1998) here which corresponds to MAR. Missing at random (MAR) is often taken as a starting point for analysis of incomplete data (Troxel et al., 2004; Zhang and Heitjan, 2006).

To illustrate, consider Y = (Y1, Y2) being a bivariate normal response with missing data only in Y2. The missing data indicator R equals 1 or 0 corresponding to Y2 being observed or missing. Assume

equation M10

where

equation M11

for r = 0, 1. For the bivariate case, the ACMV restriction is

equation M12

where [similar, equals] denotes the equality in distribution. This requires that for all Y1,

equation M13

which in turn implies that

equation M14

This restriction identified the full data response distribution.

When there are baseline covariates with time-invariant coefficients, we have that

equation M15

for r = 0, 1, where x does not contain an intercept.

The MAR assumption requires that for all X and Y1,

equation M16

By simple algebra, we can see that this restricts β(0) to be equal to β(1). Note that both β(0) and β(1) are identified from the observed data. Therefore, the ACMV restricton/MAR assumption causes over-identification and has impact on the model fit to the observed data. This is against the principle of applying identifying restrictions (Little, 1994). Ways to remedy this (and associated problems) are explored in Wang and Daniels (working paper).

2 Issues with Sensitivity Analysis

Sensitivity analysis is critical in longitudinal analysis of incomplete data with informative drop-out as stated in this paper. In the setting of missing data, the full-data model can be factored into an extrapolation model and an observed data model,

equation M17

where ωE are parameters indexing the extrapolation model and ωI are parameters indexing the observed data model and are identifiable from observed data (Daniels and Hogan, 2008). Full-data model inference requires unverifiable assumptions about the extrapolation model p(ymis|yobs, r, ωE). A sensitivity analysis explores the sensitivity of inferences of interest about the full data response model to unverifiable assumptions about the extrapolation model. This is typically done by varying sensitivity parameter, which we define next. Suppose there exists a reparameterization ξ(ω) = (ξS,ξM) such that (1) ξs is a non-constant function of ωE, (2) the observed likelihood L(ξS; ξM|yobs, r) is a constant as a function of ξS and (3) given ξS fixed, L(ξS, ξM|yobs, r) is a non-constant function of ξM. A parameter ξS that satisfies these three conditionals is a sensitivity parameter and can be used for sensitivity analysis and/or for incorporation of prior information (Daniels and Hogan, 2008).

2.1 In parametric models

Unfortunately, fully parametric selection models and shared parameter models do not allow sensitivity analysis as sensitivity parameters cannot be found (Daniels and Hogan, Chapter 8, 2008). Examining sensitivity to distributional assumptions, e.g., random effects, will provide different fits to the observed data, (yobs, r). In such cases, a sensitivity analysis cannot be done since varying the distributional assumptions does not provide equivalent fits to the observed data (Daniels and Hogan, 2008). It then becomes a problem of model selection. Next, we provide an example of the inability to find sensitivity parameters in a simple parametric selection model for binary data.

As an example, consider the situation when Y = (Y1, Y2) is a bivariate binary response with missing data only in Y2. Let R = 1 if Y2 is observed and R = 0 otherwise.

Let equation M18 be P(Y1 = y1, Y2 = y2, R = r) and equation M19 be P(Y1 = y1, R = 0). A multinomial parameterization of the full-data model of Y and R is shown in Table 1.

Table 1
A multinomial parameterization full-data model for Y

In this example, the set of parameters

equation M20

are identified by observed data without any modeling assumption. When a selection model is fully parametric, all its parameters can be identified by the observed data. To see this, we specify a parametric model for the bivariate binary example:

equation M21

Note that we assume

equation M22
(1)

and

equation M23
(2)

We will show that the full-data model is identified under the parametric assumptions by showing all parameters, (β0, β1, ϕ0, τ) can be written as a function of ωI, the identified ω’s.

First, note that

equation M24

Also, by (1),

equation M25

This gives

equation M26
(3)

As a consequence, since τ has the interpretation that

equation M27

thus it is identified by

equation M28

where equation M29 are identified by (3).

Further, since τ can also be expressed as

equation M30

hence we have that equation M31 are identified as

equation M32

Therefore, in this parametric selection model, the parameters equation M33 are all identified (as opposed to their sums, equation M34).

Finally, we can show that

equation M35

and

equation M36

2.2 In Bayesian semiparametric selection models

The factorization of a selection model provides a transparent way to understand the missing data mechanism. In Bayesian selection models, an intuitive prior specification assumes independence between the parameters of the missing data mechanism (ϕ) and the full data response (β)(Scharfstein et al., 2003).

However, in a Bayesian model under this prior specification, sensitivity parameters in a selection model, denoted by τ, can be (weakly) identified by the observed data, i.e. p(τ|yobs, r) ≠ pτ (τ), even though the observed data likelihood contains no information about the sensitivity parameters (Daniels and Hogan, 2008). We outline how this occurs in the following.

In general, a semi-parametric selection model might specify the full data response distribution nonparametrically (or saturated if a categorical response), p(y; β) with a missing data mechanism given as as follows:

equation M37

for j = 1, …, J, where hj is an arbitrary smooth function, and qj is a user specified function that encodes assumptions about how the MDM depends on missing data and it parameters are sensitivity parameters. Note that qj(Y) = 0 implies MAR and qj(Y) = qj(Yj) implies non-future dependence.

To see the cause of the weak identification, let θ = {ϕ, β} and ωI be the identified parameters. By re-parameterizing the model, we may find a mapping, indexed by τ, between θ and ωI,

equation M38

Due to the mapping, even a priori independence between τ and θ will yield a priori dependence between τ and ωI, since

equation M39
(4)

The Jacobian introduces the dependence.

The posterior for the sensitivity parameters τ can be expressed as

equation M40

Thus from (4), p(τ|yobs, r) ≠ pτ (τ).

As a concrete example, consider a bivariate binary response with missing data only in Y2 from the previous section. A saturated selection model can be specified as

equation M41

and θ = {β0, β1, β2, ϕ0, ϕ1}. MAR holds when τ = 0. Note τ is not identified by the observed data. It can be shown that for any Δτ, there exists Δθ, such that

equation M42

i.e (τ, θ) and (τ + Δτ, θ + Δθ) will yield the same law of observed data.

Let θ* = {eα0, eα1, eα2, eϕ0, eϕ1} and τ* = eτ. We can derive that

equation M43

The a priori dependence of p(ωI|τ) is thus introduced by equation M44. This has been pointed out in Scharfstein et al. (2003) and explored further in Wang et al. (working paper).

References

  • Azzalini A. Logistic regression for autocorrelated data with application to repeated measures. Biometrika. 1994;81(4):767–775.
  • Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC; 2008.
  • Fitzmaurice GM, Laird NM, Shneyer L. An Alternative Parameterization of the General Linear Mixture Model for Longitudinal Data with Non-ignorable Drop-outs. Statistics in Medicine. 2001;20(7):1009–1021. [PubMed]
  • Heagerty PJ. Marginally Specified Logistic-Normal Models for Longitudinal Binary Data. Biometrics. 1999;55(3):688–698. [PubMed]
  • Heagerty PJ. Marginalized Transition Models and Likelihood Inference for Longitudinal Categorical Data. Biometrics. 2002;58(2):342–351. [PubMed]
  • Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88(421):125–134.
  • Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81(3):471–483.
  • Molenberghs G, Michiels B, Kenward MG, Diggle PJ. Monotone missing data and pattern-mixture models. Statistica Neerlandica. 1998;52(2):153–161.
  • Roy J, Daniels MJ. A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics. 2008;64:538–545. [PMC free article] [PubMed]
  • Scharfstein DO, Daniels MJ, Robins JM. Incorporating prior beliefs about selection bias into the analysis of randomized trials with missing outcomes. Biostatistics. 2003;4(4):495. [PMC free article] [PubMed]
  • Troxel AB, Ma G, Heitjan DF. An Index of Local Sensitivity to Nonignorability. Statistica Sinica. 2004;14(4):1221–1238.
  • Wang C, Daniels MJ. A note on identifying restriction in normal mixture models with and without covariates for incomplete data. working paper. [PMC free article] [PubMed]
  • Wang C, Daniels MJ, Scharfstein DO. Bayesian semiparametric selection model with application to a breast cancer prevention trial. working paper.
  • Wilkins KJ, Fitzmaurice GM. A Hybrid Model for Nonignorable Dropout in Longitudinal Binary Responses. Biometrics. 2006;62(1):168–176. [PubMed]
  • Zhang J, Heitjan DF. A Simple Local Sensitivity Analysis Tool for Nonignorable Coarsening: Application to Dependent Censoring. Biometrics. 2006;62(4):1260–1268. [PubMed]