PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Biometrics. Author manuscript; available in PMC 2012 September 1.
Published in final edited form as:
PMCID: PMC3136648
NIHMSID: NIHMS266680

A Note on MAR, Identifying Restrictions, Model Comparison, and Sensitivity Analysis in Pattern Mixture Models With and Without Covariates for Incomplete Data

Summary

Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification (Little, 1995; Little and Wang, 1996; Thijs et al., 2002; Kenward et al., 2003; Daniels and Hogan, 2008) and are a natural starting point for missing not at random sensitivity analysis (Thijs et al., 2002; Daniels and Hogan, 2008). However, when the pattern specific models are multivariate normal, identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. In addition, we propose alternative modeling and sensitivity analysis strategies under a less restrictive assumption for the distribution of the observed response data. We adopt the deviance information criterion for model comparison and perform a simulation study to evaluate the performances of the different modeling approaches. We also apply the methods to a longitudinal clinical trial. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.

Keywords: Missing at random, Non-future dependence, Deviance information criterion

1. Introduction

For analyzing longitudinal studies with informative missingness, popular modeling frameworks include pattern mixture models, selection models and shared parameter models, which differ in the way the joint distribution of the outcome and missing data process are factorized (for a comprehensive review, see Little, 1995; Hogan and Laird, 1997; Kenward and Molenberghs, 1999; Molenberghs and Kenward, 2007; Daniels and Hogan, 2008). In this paper, we concern ourselves with pattern mixture models with monotone missingness (i.e., drop-out). For pattern mixture models with non-monotone (i.e., intermittent) missingness (details go beyond the scope of this paper), one approach is to partition the missing data and allow one (or more) or the partitions to be ignored given the other partition(s) (Harel and Schafer, 2009; Wang et al., 2010).

It is well known that pattern-mixture models are not identified: the observed data does not provide enough information to identify the distributions for incomplete patterns. The use of identifying restrictions that equate the inestimable parameters to functions of estimable parameters is an approach to resolve the problem (Little, 1995; Little and Wang, 1996; Thijs et al., 2002; Kenward et al., 2003; Daniels and Hogan, 2008). Common identifying restrictions include complete case missing value (CCMV) constraints and available case missing value (ACMV) constraints. Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR), as defined by Rubin (1976) and Little and Rubin (1987). A key and attractive feature of identifying restrictions is that they do not impact the fit of the model to the observed data. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Scharfstein et al., 2003; Zhang and Heitjan, 2006; Daniels and Hogan, 2008). In particular, MAR provides a good starting point for sensitivity analyses and sensitivity analyses are essential for inference on incomplete data (Scharfstein et al., 1999; Vansteelandt et al., 2006; Daniels and Hogan, 2008).

The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive. We explore such issues in this paper.

One criticism of mixture models is that they often induce missing data mechanisms that depend on the future (Kenward et al., 2003). We explore such non-future dependence in our context here and show how mixture models that have such missing data mechanisms have fewer sensitivity parameters.

In Section 2, we show conditions under which MAR exists and does not exist when the full-data response is assumed multivariate normal within each missing pattern. In Section 3 and Section 4 in the same setting, we explore sensitivity analysis strategies under MNAR and under non-future dependent MNAR respectively. In Section 5, we propose a sensitivity analysis approach where only the observed data within pattern are assumed multivariate normal. In Section 6, we apply the frameworks described in previous sections to a randomized clinical trial for estimating the effectiveness of recombinant growth hormone for increasing muscle strength in the elderly and propose a criterion to compare the fit of different models to the observed data. The behavior of this criterion and the model frameworks proposed in Sections ??-5 are assessed by simulation in Section 7. In Section 8, we show that in the presence of baseline covariates with time-invariant coefficients, standard identifying restrictions cause over-identification of the baseline covariate effects and we propose a remedy. We provide conclusions and discussion in Section 9.

2. Existence of MAR under Multivariate Normality within Pattern

Let Y be a J-dimensional longitudinal response vector with components scheduled to be measured at time points tj (j [set membership] {1, …, J}); this is the full data response. Without loss of generality, we assume Y1 is always observed. Let S = s denote the number of observed responses (s = 1, 2, …, J) corresponding to the follow up time ts. Let Yj denote the historical response vector (Y1, Y2, …, Yj). Finally, we define ps(·) = p(·|S = s).

We show that MAR does not necessarily exist when it is assumed that

Y[mid ]S=s~N(μ(s),(s))foralls.
(1)

To see this, we introduce some further notation. Let

μ(s)(j)=E(Y¯j[mid ]S=s)=[μ1(s)(j)μ2(s)(j)]

and

(s)(j)=Var(Y¯j[mid ]S=s)=[11(s)(j)12(s)(j)21(s)(j)22(s)(j)]

where μ1(s)(j)=E(Y¯j1[mid ]S=s),μ2(s)(j)=E(Yj[mid ]S=s),11(s)(j)=Var(Y¯j1[mid ]S=s),22(s)(j)=Var(Yj[mid ]S=s),12(s)(j)=Cov(Y¯j1,Yj[mid ]S=s) and 21(s)(j) is the transpose of 12(s)(j).

Lemma 1

For monotone dropout, under the model given in (1), define

κ1(s)(j)=21(s)(j)(11(s)(j))1κ2(s)(j)=μ2(s)(j)κ1(s)(j)μ1(s)(j)κ3(s)(j)=22(s)(j)21(s)(j)(11(s)(j))112(s)(j).

The condition that for a given j, the conditional distributions ps(yj|yj−1) are identical for all s is equivalent to κ1(s)(j),κ2(s)(j) and κ3(s)(j) being constant in s.

Proof

The proof is trivial since

Yj[mid ]Y¯j1,S=s~N(κ2(s)(j)κ1(s)(j)Y¯j1,κ3(s)(j)).

In other words, if the condition in Lemma 1 is satisfied, then there exists a conditional distribution ps(yj|yj−1) such that ps(yj|yj−1) = ps(yj|yj−1) for all sj. We now state a theorem that gives the restrictions on the model given in (1) for MAR to exist. Note that the proofs of rest of the theorems and corollaries in this and the subsequent section can be found in Web Appendix A.

Theorem 1

For pattern mixture models with monotone dropout, under the model given in (1), identification via MAR constraints exists if and only if μ(s) and Σ(s) satisfy Lemma 1 for s ≥ j and 1 < j < J.

So, a default approach for continuous Y, assuming the full data response is multivariate normal within pattern, does not allow an MAR restriction (unless the restrictions in Theorem 1 are imposed).

We now examine the corresponding missing data mechanism (MDM), S|Y. We use “[congruent with]” to denote equality in distribution.

Corollary 1

For pattern mixture models of the form (1) with monotone dropout, MAR holds if and only if S|Y [congruent with] S|Y1.

Thus, the implicit MDM is very restrictive and does not depend on the entire history, Ys. We now show connections to missing completely at random (MCAR) and other common identifying restrictions.

Corollary 2

For pattern mixture models of the form (1) with monotone dropout, MCAR is equivalent to MAR if ps(y1) = p(y1) for all s.

Corollary 3

For pattern mixture models of the form (1) with monotone dropout, MAR constraints are identical to complete case missing value (CCMV) and nearest-neighbor constraints (NCMV).

The results in this section were all based on specifying the mixture model in (1) and demonstrate that MAR only exists under the fairly strict conditions given in Theorem 1.

3. Sequential Model Specification and Sensitivity Analysis under MAR

Due to the structure of μ(s) and Σ(s) under MAR constraints as outlined in Section 2, we propose to follow the approach in Daniels and Hogan (2008, Chapter 8) and specify distributions of observed Y within pattern as:

ps(y1)~N(μ1(s),σ1(s))1sJps(yj[mid ]y¯j1)~N(μj[mid ]j(j),σj[mid ]j(j))2jsJ
(2)

where j = {1, 2, …, j − 1}. Note that by construction, we assume ps(yj|yj−1) are identical for all jsJ. Consequently, we have ps(yj|yj−1) = p(yj|yj−1, Ss), denoted as ps(yj|yj−1).

Corollary 4

For pattern mixture models of the form (1) with monotone dropout, identification via MAR constraints exists if and only if the observed data can be modeled as (2).

Corollary 4 implies that under the multivariate normality assumption in (1) and the MAR assumption, a sequential specification as in (2) always exists.

We provide some details for MAR in model (1) (which implies the specification in (2) as stated in Corollary 4) next. Distributions for missing data (which are not identified) are specified as:

ps(yj[mid ]y¯j1)~N(μj[mid ]j(j),σj[mid ]j(j))1s<jJ.

The conditional mean structure of μj[mid ]j(j) and μj[mid ]j(j) is parameterized as follows:

μj[mid ]j(j)=β0(j)+l=1j1βl(j)ylμj[mid ]j(j)=β0(j)+l=1j1βl(j)yl.

To identify the full-data model, the MAR constraints require that

pk(yj[mid ]y¯j1)=pj(yj[mid ]y¯j1)

for k < j, which implies that μj[mid ]j(j)=μj[mid ]j(j) and σj[mid ]j(j)=σj[mid ]j(j) for 2 ≤ jJ. Since the equality of the conditional means need to hold for all Y, this further implies that the MAR assumption requires that βl(j)=βl(j), 0 ≤ l < jJ.

The motivation of the proposed sequential model is to allow a straightforward extension of the MAR specification to a large class of MNAR models indexed by parameters measuring departures from MAR, as well as the attraction of doing sensitivity analysis on means and/or variances in normal models.

For example, we can let

βl(j)=Δl(j)+βl(j)andlogσj[mid ]j(j)=Δσ(j)+logσj[mid ]j(j)

for all j > 1 and 0 ≤ l < j. Sensitivity analysis can be done on these Δ parameters that capture the information about the missing data mechanism (see Web Appendix B for the impact of the Δ parameters on the MDM). For example, in a Bayesian framework, we may assign informative priors elicited from experts to these sensitivity parameters Δ. Note in general we may have separate Δl(j) and Δσ(j) for each pattern s (sj), but in practice it is necessary to limit the dimensionality of these (Daniels and Hogan, 2008). Indeed, we could make Δl(j) and Δσ(j) independent of j to further reduce the number of sensitivity parameters.

In general the MDM depends on YJ, i.e. MNAR, in presence of the the Δ parameters. However, one might want hazard at time ts to only depend on Ys+1, in which case we need to have different distributions and assumptions on [Yj|Yj−1, S = k] for k < j − 1 and j > 2, as shown in the next section.

4. Non-future Dependence and Sensitivity Analysis under Multivariate Normality within Pattern

Non-future dependence assumes that missingness only depends on observed data and the current missing value, i.e.

[S=s[mid ]Y][similar, equals][S=s[mid ]Y¯s+1],

and can be viewed as a special case of MNAR and an extension of MAR (Kenward et al., 2003). Kenward et al. (2003) showed that non-future dependence holds if and only if for each j ≥ 3 and k < j − 1,

pk(yj[mid ]y¯j1)=pj1(yj[mid ]y¯j1).

An approach to implement non-future dependence within the framework of Section 3 is as follows. We model the observed data as in (2). For the conditional distribution of the current missing data (Ys+1), we assume that

ps(ys+1[mid ]y¯s)~N(β0(s+1)+Δ0(s+1)+l=1s(βl(s+1)+Δl(s+1))yl,eΔσ(s+1)σs[mid ]s(s+1))2s<J

and for the conditional distribution of the future missing data (Ys+2, …, YJ), we assume that

ps(yj[mid ]y¯j1)=pj1(yj[mid ]y¯j1)2s<j1J1,

where

pj1(yj[mid ]y¯j1)=p(S=j1)p(Sj1)pj1(yj[mid ]y¯j1)+p(Sj)p(Sj1)pj(yj[mid ]y¯j1).

Note that by this approach, although the model for future missing data is a mixture of normals, the sensitivity parameters are kept the same as in Section 3 ( Δl(j) and Δσ(j), j = 2, …, J and l = 0, …, j − 1). In addition, this significantly reduces the number of potential sensitivity parameters. For J-dimensional longitudinal data, the total number of sensitivity parameters, (2J3 + 3J2 + J)/6 − J is reduced to (J2 + 3J − 4)/2; for J=3 (6), from 11 (85) to 7 (25). Further reduction is typically needed. See the data example in Section 6 as an illustration. If all of the remaining sensitivity parameters are set to zero, we have ps(ys+1|ys) = ps+1(ys+1|ys) for 2 ≤ s < J and ps(yj|yj−1) = pj(yj|yj−1) for 2 ≤ s < j − 1 ≤ J − 1, which implies ps(yj|yj−1) = pj(yj|yj−1) for all s < j, i.e. MAR.

5. MAR and Sensitivity Analysis with Multivariate Normality on the Observed-data Response

If we assume multivariate normality only on observed data response, Yobs|S instead of the full data response, Y|S, we can weaken the restrictions on ps(yj|yj−1) for sj and allow the MDM to incorporate all observed data under MAR (cf. Corollary 1).

For example, we may specify distributions Yobs|S as follows:

ps(y1)~N(μ1(s),σ1(s))1sJps(yj[mid ]y¯j1)~N(μj[mid ]j(s),σj[mid ]j(s))2jsJ

where

μj[mid ]j(s)=βj,0(s)+l=1j1βj,l(s)Yl.

To identify the full-data model, recall the MAR constraints imply that

ps(yj[mid ]y¯j1)=pj(yj[mid ]y¯j1)=k=jJP(S=k)P(Sj)pk(yj[mid ]y¯j1)
(3)

for s < j, which are mixture of normals. For sensitivity analysis in this setting of mixture of normals, we propose to introduce sensitivity parameters Δμ (location) and Δσ (scale) such that for s < j

ps(yj[mid ]y¯j1)=eΔσ(j)k=jJ[var pi]j,kpk(yjΔμ(j)(1eΔσ(j))μj[mid ]j(k)eΔσ(j)[mid ]y¯j1)
(4)

where [var pi]j,k=P(S=k)P(Sj). The rationale for this parameterization is that each pk(·|yj−1) in the summation will have mean Δμ(j)+μj[mid ]j(k) and variance e2Δσ(j)σj[mid ]j1(k). To reduce the dimension of the sensitivity parameters, we could make Δμ(j) and Δσ(j) common for all j (namely Δμ and Δσ).

In this set up, we have

μj[mid ]j(s),MNAR=Δμ(j)+k=jJ[var pi]j,kμj[mid ]j(k)

and

equation image

where

equation image

(see Web Appendix C for details). Note that An external file that holds a picture, illustration, etc.
Object name is nihms266680ig1.jpg does not depend on σj[mid ]j1(k) for k = j, …, J.

Under an MAR assumption (3), for [Yj|Yj−1, S = s], we have

μj[mid ]j(s),MAR=k=jJ[var pi]j,kμj[mid ]j(k)

and

σj[mid ]j(s),MAR=k=jJ[var pi]j,k(σj[mid ]j1(k)+(μj[mid ]j1(k))2)(μj[mid ]j(s),MAR)2.

Therefore, under MNAR assumption (4), the two sensitivity parameters control the departure of the mean and variance from MAR in the following way,

equation image

with Δμ(j) being a location parameter and Δσ(j) being a scale parameter. The MNAR class allows MAR when Δμ(j)=Δσ(j)=0 for all j ≥ 2.

By assuming non-future dependence, we obtain

ps(yj[mid ]y¯j1)=pj1(yj[mid ]y¯j1)=p(S=j1)p(Sj1)eΔσ(j)k=jJ[var pi]j,kpk(yjΔμ(j)(1eΔσ(j))μj[mid ]j(k)eΔσ(j)[mid ]y¯j1)+k=jJP(S=k)p(Sj1)pk(yj[mid ]y¯j1)2s<j1J1,

for the future data and (4) for the current data (j = s + 1). The number of sensitivity parameters in this setup is reduced from J(J −1) by (J −2)(J −1) to 2(J −1); so, for J = 3 (6), from 6 (30) to 2 (20). Further reductions are illustrated in Section 6.

6. Example: Growth Hormone Study

We analyze a longitudinal clinical trial using the framework from Sections 4 and 5 that assume multivariate normality for the full-data response within pattern (MVN) or multivariate normality for the observed data response within pattern (OMVN). We assume non-future dependence for the missing data mechanism to minimize the number of sensitivity parameters.

The growth hormone (GH) trial was a randomized clinical trial conducted to estimate the effectiveness of recombinant human growth hormone therapy for increasing muscle strength in the elderly. The trial had four treatment arms: placebo (P), growth hormone only (G), exercise plus placebo (EP), and exercise plus growth hormone (EG). Muscle strength, here mean quadriceps strength (QS), measured as the maximum foot-pounds of torque that can be exerted against resistance provided by a mechanical device, was measured at baseline, 6 months and 12 months. There were 161 participants enrolled on this study, but only (roughly) 75% of them completed the 12 month follow up. Researchers believed that dropout was related to the unobserved strength measures at the dropout times.

For illustration, we confine our attention to the two arms using exercise: exercise plus growth hormone (EG) and exercise plus placebo (EP). Table 1 contains the observed data for the two arms.

Table 1
Growth Hormone Study: Sample mean (standard deviation) stratified by dropout pattern.

Let (Y1, Y2, Y3) denote the full-data response corresponding to baseline, 6 months, and 12 months. Let Z be the treatment indicator (1 = EG, 0 = EP). Our goal is to draw inference about the mean difference of QS between the two treatment arms at month 12. That is, the treatment effect θ = E(Y3|Z = 1) − E(Y3|Z = 0). In the full-data model for each treatment under non-future dependence, there are seven sensitivity parameters for the MVN model: { Δ0(2),Δ1(2),Δ0(3),Δ1(3),Δ2(3),Δσ(2),Δσ(3)}, and four sensitivity parameters for OMVN model: { Δμ(2),Δμ(3),Δσ(2),Δσ(3)}; see Web Appendix D for details on the models. For the MNAR analysis, we reduced the number of sensitivity parameters as follows:

  • Δσ(2) and Δσ(3) do not appear in the posterior distribution of E(Y3|Z) for Z = 0, 1, and thus are not necessary for inference on θ.
  • We restrict to MNAR departures from MAR in terms of the intercept terms by assuming Δ1(2)=Δ1(3)=Δ2(3)[equivalent]0.
  • We assume the sensitivity parameters are identical between treatments.

This reduces the set of sensitivity parameters to { Δ0(2),Δ0(3)} for MVN model and { Δμ(2),Δμ(3)} for the OMVN model.

There are a variety of ways to specify priors for the sensitivity parameters Δ0(2) and Δ0(3),

Δ0(2)=E(Y2[mid ]Y1,S=1)E(Y2[mid ]Y1,S2)Δ0(3)=E(Y3[mid ]Y2,Y1,S=2)E(Y3[mid ]Y2,Y1,S=3).

Both represent the difference of conditional means between the observed and unobserved responses. Δμ(2) and Δμ(3) have (roughly) the same interpretation as Δ0(2) and Δ0(3).

Based on discussion with investigators, we made the assumption that dropouts do worse than completers; thus, we restrict the Δ’s to be less than or equal to zero. To do a fully Bayesian analysis to fairly characterize the uncertainty associated with the missing data mechanism, we assume a uniform prior for the Δ’s as a default choice. Subject matter considerations gave an upper bound of zero for the uniform distributions. We set the lower bound using the variability of the observed data as follows. We estimate the residual variances of Y2|Y1 and Y3|Y2, Y1 using the observed data; we denote these by τ2|1 and τ3|2,1 respectively. We use the square root of these estimates as the lower bounds. In particular, we specify the priors for { Δ0(2),Δ0(3)} as well as { Δμ(2),Δμ(3)} as Unif(An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ)), where

equation image
(5)

Based on the estimates τ2[mid ]11/2=18 and τ3[mid ]2,11/2=12, the priors are [−18, 0] × [−12, 0] for { Δ0(2),Δ0(3)} and for { Δμ(2),Δμ(3)}. For the other parameters in the full-data model, we assign N(0, 106) for mean parameters (μ, β) and Unif(0, 100) for variance parameters (σ1/2).

We fit the model using WinBUGS, with multiple chains of 25, 000 iterations and 4000 burn-in. Convergence was checked by examining trace plots of the multiple chains.

The results of the MVN and OMVN models are given in Table 2. Under MNAR, the posterior mean (posterior standard deviation) of the difference in quadriceps strength at 12 months between the two treatment arms was 4.0 (8.9) and 4.4 (10) for the MVN and OMVN models. Under MAR the differences were 5.4 (8.8) and 5.8 (9.9) for the MVN and OMVN models, respectively. The smaller differences under MNAR were due to quadriceps strength at 12 months being lower under MNAR due to the assumption that dropouts do worse than completers.

Table 2
Growth Hormone Study: Posterior mean (standard deviation) stratified by treatment.

To compare the fit of models OMVN and MVN, we used the deviance information criterion that is based on the observed data likelihood (DICO),

DICO=4E{logL(θ[mid ]yobs,S)}+2logL{E(θ[mid ]yobs,S)[mid ]yobs,S}.
(6)

The results favored the OMVN model (Table 2). The behavior of DICO for model selection in the setting of incomplete data is explored in Section 7. Note that given the OMVN or MVN specification for the observed data, the fit (as measured by the DICO) was equivalent across the different missingness mechanisms as it should be since the observed data provide no information about this. We conclude that the treatment difference, θ was not significantly different from zero.

To assess the sensitivity of the informative priors Unif(An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ)), we evaluated several scenarios by making the priors more or less informative by modifying the range (see Table 3). Specifically, we considered (1) An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ) = [−20, 0] × [−20, 0], (2) An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ) = [−9, 0] × [−6, 0], (3) An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ) = [−20, 0] × [−6, 0] and (4) An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ) = [−9, 0] × [−20, 0]; note that the lower bounds of −9 and −6 are τ2[mid ]11/2/2 and τ3[mid ]2,11/2/2, respectively. Based on subject matter considerations, we assumed the difference of the conditional means between the observed and unobserved responses did not exceed 20 foot pounds of torque. None of the scenarios considered resulted in a significant mean treatment difference at month 12.

Table 3
Growth Hormone Study: MNAR Sensitivity Analysis

7. Model Comparison and Simulations

Although the specification in Section 3–4 arguably offers a simpler sensitivity analysis than that in Section 5, the former may not fit the observed data as well (as indicated by DICO in Section 6). In addition, if the conclusions between OMVN and MVN were substantially different (unlike in the example here), we would need to decide which model to use for inference.

Spiegelhalter et al. (2002) developed the deviance information criterion (DIC) for Bayesian model comparison. In the setting of incomplete data, many alternative representations of DIC have been proposed (Celeux et al., 2006; Daniels and Hogan, 2008). The underlying complication is the fact that we do not observe the full data. Recommendations in Celeux et al. (2006) were based on missing data that were actually latent variables in random effects and mixture models, not potentially observable missing data as is our focus. Daniels and Hogan (2008) recommended DICO (6) on intuitive grounds but did no exploration of its operating characteristics. The following simulation study explores the behavior of DICO, as well as the behavior of MVN and OMVN models.

We simulated observed data from the MVN and OMVN models. Parameters estimated by fitting the MVN and OMVN models to the observed EG arm data from the growth hormone study were the basis for the simulation (see Table 4).

Table 4
Simulation Scenario: Parameters based on the observed EG arm from the growth hormone study.

For each sample size, 50, 150 and 300, we simulated 500 datasets for MVN and OMVN observed data models. The performance of the different approaches were assessed using both the DICO and the mean squared error (MSE) of E(Yj): j = 2, 3; see Table 5. Note that we computed the “true” E(Y2) and E(Y3) for MNAR cases using the uniform prior with An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg(τ) = [−18, 0] × [−12, 0] from Section 6. For reference, the MSE’s associated with the true data generating model are bolded.

Table 5
Simulation Results: Comparison of MSE and performance of DICO in MVN and OMVN models. The columns correspond to the models fit and the rows to the scenario under which the data was generated.

The simulations favored the correctly specified model with respect to both DICO and MSE. As the sample size was increased, the MSE’s decreased and the difference between the correct and the incorrect models increased. The MSE for E[Yj]: j = 2, 3 provides information on the fit to the full data response model (both the observed and missing data), not just the observed data model; however, all that can be checked from the observed data is the fit to the observed data response model. Based on this (limited) simulation, we conclude that DICO is a reasonable criterion to choose between different incomplete data models and has more power when the sample size is moderate or large. Sensitivity analysis and all inferences can then be conducted on the model that fits the observed data the best. The simulation result also showed that when the MVN specification is true, OMVN is competitive, but not vice versa. This is reasonable because OMVN is the conservative choice here given that MVN is nested within it.

To compare the robustness of the MVN and OMVN models when neither is correct, we considered selection models where the distribution of Y is heavy-tailed or asymmetric. Specifically, we simulated response data from a multivariate t (MVT) distribution Y ~ MVT(μ, Σ, df) with df being the degrees of freedom, and a multivariate skew-normal (MSKN) distribution Y ~ MSKN(μ, Σ, ω) with ω being the skewness factor (Azzalini and Valle, 1996). See Web Appendix E for details of data generation for these cases.

The average DICO are reported in Table 6. Note that MAR and MNAR cases correspond to different observed data models in this simulation setting. Therefore, we report DICO for MAR and MNAR separately.

Table 6
Simulation Results: Assessing fit of MVN and OMVN models using DICO under heavy-tailed and skewed data generating mechanisms. Values in the table are DICO(rate in favor). Rate in favor corresponds to the percentage of simulated datasets where the model ...

From the simulation, we can see that MVN model is more likely to be selected by DICO when sample size is small. When sample size is moderate or large, OMVN model fits the observed data better for all the cases considered and offers more robustness (in particular for the heavy-tailed scenario). OMVN appears to be the better default choice for medium to large sample sizes.

8. ACMV Restrictions and Multivariate Normality with Baseline Covariates

In this section, we show that common identifying restrictions over-identify estimable parameters in the presence of baseline covariates with time invariant coefficients and offer a solution.

Consider the situation when Y = (Y1, Y2) is a bivariate normal response (J = 2) with missing data only in Y2, i.e. S = 1 or 2. Assume there are baseline covariates X with time invariant coefficients α. We model p(S) and p(Y|S) as follows:

S[mid ]X~Bern([var phi](X))Y[mid ]S=s~N(μ(s)(X),(s))s=1,2

where

μ(s)=[μ1(s)+Xα(s)μ2(s)+Xα(s)]and(s)=[σ11(s)σ12(s)σ21(s)σ22(s)].

MAR (ACMV) implies the following restriction

[Y2[mid ]Y1,S=1][similar, equals][Y2[mid ]Y1,S=2].

This implies that conditional means, E(Y2|Y1, X, S = s) for s = 1, 2, are equal, i.e.

μ2(1)+Xα(1)+σ21(1)σ11(1)(Y1μ1(1)Xα(1))=μ2(2)+Xα(2)+σ21(2)σ11(2)(Y1μ1(2)Xα(2)).
(7)

For (7) to hold for all Y1 and X, we need that

α(1)=α(2).

However, both α(1) and α(2) are already identified by the observed data Y1. Thus the ACMV (MAR) restriction affects the model fit to the observed data. This is against the principle of applying identifying restrictions (Little and Wang, 1996; Daniels and Wang, 2009).

To resolve the over-identification issue, we propose to apply MAR constraints on residuals instead of directly on the responses. In the bivariate case, the corresponding restriction is

[Y2Xα(1)[mid ]Y1Xα(1),X,S=1][similar, equals][Y2Xα(2)[mid ]Y1Xα(2),X,S=2].
(8)

Since the conditional distributions

[Y2Xα(s)[mid ]Y1Xα(s),X,S=s]~N(μ2(s)+σ21(s)σ11(s)(Y1μ1(s)),σ22(s)(σ21(s))2σ11(s))

are independent of α(s) for s = 1, 2, the restriction (8) places no constraints on α(s), thus avoiding over-identification.

The MDM corresponding to the ACMV(MAR) on the residuals is given by

logP(S=1[mid ]Y,X)P(S=2[mid ]Y,X)=log[var phi](X)1[var phi]X12σ*{(1B)2X(α2α(2)Tα(1)α(1)T)XT2(1B)(Y2Δ(Y1))X(α(2)α(1))}12logσ11(2)σ11(1)(Y1Xα(2)μ1(2))22(σ11(2))2+(Y1Xα(1)μ1(1))22(σ11(1))2,

where σ*=σ22(2)(σ21(2))2σ11(2),B=σ21(1)σ11(1) and Δ(Y1)=μ2(2)+σ21(2)σ11(2)(Y1μ1(2)). Hence by assuming MAR on the residuals, we have the MDM being a quadratic form of Y1, but independent of Y2 if and only if α(2) = α(1). In other words, assumption (8) implies MAR if and only if α(2) = α(1). So in general, MAR on residuals does not imply that missingness in Y2 is MAR. However, it is an identifying restriction that does not impact the fit of the model to the observed data. CCMV and NCMV restrictions can be applied similarly to the residuals.

Remark

In general, μl(s) can be replaced by μil(s) if there are subject-specific covariates with time varying coefficients.

The ACMV (MAR) on the residuals restriction can be applied to the multivariate case and results in a similar MDM. Detailed discussion is provided in Web Appendix F.

9. Summary

Most pattern mixture models allow the missingness to be MNAR, with MAR as a unique point in the parameter space. The magnitude of departure from MAR can be quantified via a set of sensitivity parameters. For MNAR analysis, it is critical to find scientifically meaningful and dimensionally tractable sensitivity parameters. For this purpose, (multivariate) normal distributions are often found attractive since the MNAR departure from MAR can be parsimoniously defined by deviations in the mean and (co-)variance.

However, a simple pattern mixture model based on multivariate normality for the full data response within patterns does not allow MAR without special restrictions that themselves, induce a very restrictive missing data mechanism. We have explored this fully and proposed alternatives based on multivariate normality for the observed data response within patterns. In both these contexts, we proposed strategies for specifying sensitivity parameters.

The proposed modeling and sensitivity analysis approaches based on within-pattern multivariate normality for the full data response or the observed data response may lead to contradicting study conclusions, due to the different (unverifiable) missing data mechanism assumptions and the different observed data model. For the latter, we proposed to use the deviance information criterion that is based on the observed data likelihood for model comparison and selection and showed via simulations that it appears to perform well. Sensitivity analysis and inferences can then be based upon the model that fits the observed data the best.

In addition, we showed that when introducing baseline covariates with time invariant coefficients, standard identifying restrictions result in over-identification of the model. This is against the principle of applying identifying restriction in that they should not affect the model fit to the observed data. We proposed a simple alternative set of restrictions based on residuals that can be used as an ’identification’ starting point for an analysis using mixture models.

In the growth hormone study data example, we showed how to reduce the number of sensitivity parameters in practice and a default way to construct informative priors for sensitivity parameters based on limited knowledge about the missingness. In particular, all the values in the range An external file that holds a picture, illustration, etc.
Object name is nihms266680ig2.jpg were weighted equally via a uniform distribution. If there is additional external information from expert opinion or historical data, informative priors may be used to incorporate such information (for example, see Ibrahim and Chen, 2000; Wang et al., 2010). Finally, an important consideration in sensitivity analysis and constructing informative priors is that they should avoid extrapolating missing values outside of a reasonable range (e.g., in the growth hormone trial example, we would not want to impute a negative quadriceps strength).

10. Supplementary Materials

Web Appendices and Tables referenced in Sections 2, 3, 5, 6, 7, and 8 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

Supplementary Material

Supplementary Data

Contributor Information

Chenguang Wang, Division of Biostatistics, Center for Devices and Radiological Health, FDA, Silver Spring, Maryland 20993.

Michael J. Daniels, Department of Statistics, University of Florida, Gainesville, FL 32611.

References

  • Azzalini A, Valle A. The multivariate skew-normal distribution. Biometrika. 1996;83:715–726.
  • Celeux G, Forbes F, Robert C, Titterington D. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674.
  • Daniels M, Hogan J. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC; 2008.
  • Daniels M, Wang C. Discussion of “Missing Data in longitudinal studies: A review” by Ibrahim and Molenberghs. TEST. 2009;18:51–58. [PMC free article] [PubMed]
  • Harel O, Schafer J. Partial and latent ignorability in missing-data problems. Biometrika. 2009;96:37.
  • Hogan J, Laird N. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997;16:259–272. [PubMed]
  • Ibrahim J, Chen M. Power prior distributions for regression models. Statistical Science. 2000;15:46–60.
  • Kenward M, Molenberghs G. Parametric Models for Incomplete Continuous and Categorical Longitudinal Data. Statistical Methods in Medical Research. 1999;8:51. [PubMed]
  • Kenward M, Molenberghs G, Thijs H. Pattern-mixture models with proper time dependence. Biometrika. 2003;90:53–71.
  • Little R. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90
  • Little R, Rubin D. Statistical Analysis with Missing Data. Wiley; 1987.
  • Little R, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed]
  • Molenberghs G, Kenward M. Missing Data in Clinical Studies. Wiley; 2007.
  • Molenberghs G, Michiels B, Kenward M, Diggle P. Monotone Missing Data and Pattern-Mixture Models. Statistica Neerlandica. 1998;52:153–161.
  • Rubin D. Inference and missing data. Biometrika. 1976;63:581–592.
  • Scharfstein D, Daniels M, Robins J. Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes. Biostatistics. 2003;4:495. [PMC free article] [PubMed]
  • Scharfstein D, Rotnitzky A, Robins J. Adjusting for Nonignorable DropOut Using Semiparametric Nonresponse Models. Journal of the American Statistical Association. 1999;94:1096–1146.
  • Spiegelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583–639.
  • Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3:245. [PubMed]
  • Vansteelandt S, Goetghebeur E, Kenward M, Molenberghs G. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica. 2006;16:953–979.
  • Wang C, Daniels M, Scharfstein D, Land S. A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. Journal of the American Statistical Association. 2010 (in press) [PMC free article] [PubMed]
  • Zhang J, Heitjan D. A simple local sensitivity analysis tool for nonignorable coarsening: application to dependent censoring. Biometrics. 2006;62:1260–1268. [PubMed]