Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3136648

Formats

Article sections

- Summary
- 1. Introduction
- 2. Existence of MAR under Multivariate Normality within Pattern
- 3. Sequential Model Specification and Sensitivity Analysis under MAR
- 4. Non-future Dependence and Sensitivity Analysis under Multivariate Normality within Pattern
- 5. MAR and Sensitivity Analysis with Multivariate Normality on the Observed-data Response
- 6. Example: Growth Hormone Study
- 7. Model Comparison and Simulations
- 8. ACMV Restrictions and Multivariate Normality with Baseline Covariates
- 9. Summary
- 10. Supplementary Materials
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2012 September 1.

Published in final edited form as:

Published online 2011 March 1. doi: 10.1111/j.1541-0420.2011.01565.x

PMCID: PMC3136648

NIHMSID: NIHMS266680

Chenguang Wang, Division of Biostatistics, Center for Devices and Radiological Health, FDA, Silver Spring, Maryland 20993;

Chenguang Wang: vog.shh.adf@gnaw.gnaugnehc; Michael J. Daniels: ude.lfu.tats@sleinadm

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification (Little, 1995; Little and Wang, 1996; Thijs et al., 2002; Kenward et al., 2003; Daniels and Hogan, 2008) and are a natural starting point for missing not at random sensitivity analysis (Thijs et al., 2002; Daniels and Hogan, 2008). However, when the pattern specific models are multivariate normal, identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. In addition, we propose alternative modeling and sensitivity analysis strategies under a less restrictive assumption for the distribution of the observed response data. We adopt the deviance information criterion for model comparison and perform a simulation study to evaluate the performances of the different modeling approaches. We also apply the methods to a longitudinal clinical trial. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.

For analyzing longitudinal studies with informative missingness, popular modeling frameworks include pattern mixture models, selection models and shared parameter models, which differ in the way the joint distribution of the outcome and missing data process are factorized (for a comprehensive review, see Little, 1995; Hogan and Laird, 1997; Kenward and Molenberghs, 1999; Molenberghs and Kenward, 2007; Daniels and Hogan, 2008). In this paper, we concern ourselves with pattern mixture models with monotone missingness (i.e., drop-out). For pattern mixture models with non-monotone (i.e., intermittent) missingness (details go beyond the scope of this paper), one approach is to partition the missing data and allow one (or more) or the partitions to be ignored given the other partition(s) (Harel and Schafer, 2009; Wang et al., 2010).

It is well known that pattern-mixture models are not identified: the observed data does not provide enough information to identify the distributions for incomplete patterns. The use of identifying restrictions that equate the inestimable parameters to functions of estimable parameters is an approach to resolve the problem (Little, 1995; Little and Wang, 1996; Thijs et al., 2002; Kenward et al., 2003; Daniels and Hogan, 2008). Common identifying restrictions include complete case missing value (CCMV) constraints and available case missing value (ACMV) constraints. Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR), as defined by Rubin (1976) and Little and Rubin (1987). A key and attractive feature of identifying restrictions is that they do not impact the fit of the model to the observed data. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Scharfstein et al., 2003; Zhang and Heitjan, 2006; Daniels and Hogan, 2008). In particular, MAR provides a good starting point for sensitivity analyses and sensitivity analyses are essential for inference on incomplete data (Scharfstein et al., 1999; Vansteelandt et al., 2006; Daniels and Hogan, 2008).

The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive. We explore such issues in this paper.

One criticism of mixture models is that they often induce missing data mechanisms that depend on the future (Kenward et al., 2003). We explore such non-future dependence in our context here and show how mixture models that have such missing data mechanisms have fewer sensitivity parameters.

In Section 2, we show conditions under which MAR exists and does not exist when the full-data response is assumed multivariate normal within each missing pattern. In Section 3 and Section 4 in the same setting, we explore sensitivity analysis strategies under MNAR and under non-future dependent MNAR respectively. In Section 5, we propose a sensitivity analysis approach where only the *observed* data within pattern are assumed multivariate normal. In Section 6, we apply the frameworks described in previous sections to a randomized clinical trial for estimating the effectiveness of recombinant growth hormone for increasing muscle strength in the elderly and propose a criterion to compare the fit of different models to the observed data. The behavior of this criterion and the model frameworks proposed in Sections **??**-5 are assessed by simulation in Section 7. In Section 8, we show that in the presence of baseline covariates with time-invariant coefficients, standard identifying restrictions cause over-identification of the baseline covariate effects and we propose a remedy. We provide conclusions and discussion in Section 9.

Let ** Y** be a

We show that MAR does not necessarily exist when it is assumed that

$$\mathit{Y}S=s~\text{N}({\mathit{\mu}}^{(s)},{\mathbf{\sum}}^{(s)})\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}s.$$

(1)

To see this, we introduce some further notation. Let

$${\mathit{\mu}}^{(s)}(j)=\text{E}({\overline{Y}}_{j}S=s)=\left[\begin{array}{c}{\mathit{\mu}}_{1}^{(s)}(j)\\ {\mu}_{2}^{(s)}(j)\end{array}\right]$$

and

$${\mathbf{\sum}}^{(s)}(j)=\text{Var}({\overline{Y}}_{j}S=s)=\left[\begin{array}{cc}{\mathbf{\sum}}_{11}^{(s)}(j)& {\mathbf{\sum}}_{12}^{(s)}(j)\\ {\mathbf{\sum}}_{21}^{(s)}(j)& {\mathrm{\sum}}_{22}^{(s)}(j)\end{array}\right]$$

where ${\mathit{\mu}}_{1}^{(s)}(j)=\text{E}({\overline{Y}}_{j-1}S=s),{\mu}_{2}^{(s)}(j)=\text{E}({Y}_{j}S=s),{\mathbf{\sum}}_{11}^{(s)}(j)=\text{Var}({\overline{Y}}_{j-1}S=s),{\mathrm{\sum}}_{22}^{(s)}(j)=\text{Var}({Y}_{j}S=s),{\mathbf{\sum}}_{12}^{(s)}(j)=\text{Cov}({\overline{Y}}_{j-1},{Y}_{j}S=s)$ and ${\mathbf{\sum}}_{21}^{(s)}(j)$ is the transpose of ${\mathbf{\sum}}_{12}^{(s)}(j)$.

For monotone dropout, under the model given in (1), define

$$\begin{array}{l}{\kappa}_{1}^{(s)}(j)={\mathbf{\sum}}_{21}^{(s)}(j){\left({\mathrm{\sum}}_{11}^{(s)}(j)\right)}^{-1}\\ {\kappa}_{2}^{(s)}(j)={\mu}_{2}^{(s)}(j)-{\kappa}_{1}^{(s)}(j){\mathit{\mu}}_{1}^{(s)}(j)\\ {\kappa}_{3}^{(s)}(j)={\mathrm{\sum}}_{22}^{(s)}(j)-{\mathbf{\sum}}_{21}^{(s)}(j){\left({\mathbf{\sum}}_{11}^{(s)}(j)\right)}^{-1}{\mathbf{\sum}}_{12}^{(s)}(j).\end{array}$$

The condition that for a given j, the conditional distributions p_{s}(y_{j}|_{j−1}) are identical for all s is equivalent to
${\kappa}_{1}^{(s)}(j),{\kappa}_{2}^{(s)}(j)$ and
${\kappa}_{3}^{(s)}(j)$ being constant in s.

The proof is trivial since

$${Y}_{j}{\overline{Y}}_{j-1},S=s~\text{N}\left({\kappa}_{2}^{(s)}(j)-{\kappa}_{1}^{(s)}(j){\overline{Y}}_{j-1},{\kappa}_{3}^{(s)}(j)\right).$$

In other words, if the condition in Lemma 1 is satisfied, then there exists a conditional distribution *p*_{≥}* _{s}*(

For pattern mixture models with monotone dropout, under the model given in (1), identification via MAR constraints exists if and only if **μ**^{(s)} and **Σ**^{(s)} satisfy Lemma 1 for s ≥ j and 1 < j < J.

So, a default approach for continuous *Y*, assuming the full data response is multivariate normal within pattern, does not allow an MAR restriction (unless the restrictions in Theorem 1 are imposed).

We now examine the corresponding missing data mechanism (MDM), *S|*** Y**. We use “” to denote equality in distribution.

For pattern mixture models of the form (1) with monotone dropout, MAR holds if and only if S|**Y** S|Y_{1}.

Thus, the implicit MDM is very restrictive and does not depend on the entire history, * _{s}*. We now show connections to missing completely at random (MCAR) and other common identifying restrictions.

For pattern mixture models of the form (1) with monotone dropout, MCAR is equivalent to MAR if p_{s}(y_{1}) = p(y_{1}) for all s.

For pattern mixture models of the form (1) with monotone dropout, MAR constraints are identical to complete case missing value (CCMV) and nearest-neighbor constraints (NCMV).

The results in this section were all based on specifying the mixture model in (1) and demonstrate that MAR only exists under the fairly strict conditions given in Theorem 1.

Due to the structure of *μ*^{(}^{s}^{)} and **Σ**^{(}^{s}^{)} under MAR constraints as outlined in Section 2, we propose to follow the approach in Daniels and Hogan (2008, Chapter 8) and specify distributions of observed *Y* within pattern as:

$$\begin{array}{c}{p}_{s}({y}_{1})~\text{N}({\mu}_{1}^{(s)},{\sigma}_{1}^{(s)})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le s\le J\\ {p}_{s}({y}_{j}{\overline{y}}_{j-1})~\text{N}({\mu}_{j{j}^{-}(\ge j)}^{,}\end{array}$$

(2)

where *j ^{−}* = {1, 2, …,

For pattern mixture models of the form (1) with monotone dropout, identification via MAR constraints exists if and only if the observed data can be modeled as (2).

Corollary 4 implies that under the multivariate normality assumption in (1) and the MAR assumption, a sequential specification as in (2) always exists.

We provide some details for MAR in model (1) (which implies the specification in (2) as stated in Corollary 4) next. Distributions for missing data (which are not identified) are specified as:

$${p}_{s}({y}_{j}{\overline{y}}_{j-1})~\text{N}({\mu}_{j{j}^{-}(j)}^{,}$$

The conditional mean structure of ${\mu}_{j{j}^{-}(\ge j)}^{}$ and ${\mu}_{j{j}^{-}(j)}^{}$ is parameterized as follows:

$$\begin{array}{l}{\mu}_{j{j}^{-}(\ge j)}^{=}& {\mu}_{j{j}^{-}(j)}^{=}\end{array}$$

To identify the full-data model, the MAR constraints require that

$${p}_{k}({y}_{j}{\overline{y}}_{j-1})={p}_{\ge j}({y}_{j}{\overline{y}}_{j-1})$$

for *k* < *j*, which implies that
${\mu}_{j{j}^{-}(j)}^{=}$ and
${\sigma}_{j{j}^{-}(j)}^{=}$ for 2 ≤ *j* ≤ *J*. Since the equality of the conditional means need to hold for all ** Y**, this further implies that the MAR assumption requires that
${\beta}_{l}^{(j)}={\beta}_{l}^{(\ge j)}$, 0 ≤

The motivation of the proposed sequential model is to allow a straightforward extension of the MAR specification to a large class of MNAR models indexed by parameters measuring departures from MAR, as well as the attraction of doing sensitivity analysis on means and/or variances in normal models.

For example, we can let

$${\beta}_{l}^{(j)}={\mathrm{\Delta}}_{l}^{(j)}+{\beta}_{l}^{(\ge j)}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}log{\sigma}_{j{j}^{-}(j)}^{=}$$

for all *j* > 1 and 0 ≤ *l* < *j*. Sensitivity analysis can be done on these Δ parameters that capture the information about the missing data mechanism (see Web Appendix B for the impact of the Δ parameters on the MDM). For example, in a Bayesian framework, we may assign informative priors elicited from experts to these sensitivity parameters Δ. Note in general we may have separate
${\mathrm{\Delta}}_{l}^{(j)}$ and
${\mathrm{\Delta}}_{\sigma}^{(j)}$ for each pattern *s* (*s* ≤ *j*), but in practice it is necessary to limit the dimensionality of these (Daniels and Hogan, 2008). Indeed, we could make
${\mathrm{\Delta}}_{l}^{(j)}$ and
${\mathrm{\Delta}}_{\sigma}^{(j)}$ independent of *j* to further reduce the number of sensitivity parameters.

In general the MDM depends on * _{J}*, i.e. MNAR, in presence of the the Δ parameters. However, one might want hazard at time

Non-future dependence assumes that missingness only depends on observed data and the current missing value, i.e.

$$[S=s\mathit{Y}][S=s{\overline{Y}}_{s+1}],$$

and can be viewed as a special case of MNAR and an extension of MAR (Kenward et al., 2003). Kenward et al. (2003) showed that non-future dependence holds if and only if for each *j* ≥ 3 and *k* < *j* − 1,

$${p}_{k}({y}_{j}{\overline{y}}_{j-1})={p}_{\ge j-1}({y}_{j}{\overline{y}}_{j-1}).$$

An approach to implement non-future dependence within the framework of Section 3 is as follows. We model the observed data as in (2). For the conditional distribution of the current missing data (*Y _{s}*

$${p}_{s}({y}_{s+1}{\overline{y}}_{s})~\text{N}({\beta}_{0}^{(\ge s+1)}+{\mathrm{\Delta}}_{0}^{(s+1)}+\sum _{l=1}^{s}({\beta}_{l}^{(\ge s+1)}+{\mathrm{\Delta}}_{l}^{(s+1)}){y}_{l},{e}^{{\mathrm{\Delta}}_{\sigma}^{(s+1)}}{\sigma}_{s{s}^{-}(\ge s+1)}^{})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}2\le sJ$$

and for the conditional distribution of the future missing data (*Y _{s}*

$${p}_{s}({y}_{j}{\overline{y}}_{j-1})={p}_{\ge j-1}({y}_{j}{\overline{y}}_{j-1})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}2\le sj-1\le J-1,$$

where

$${p}_{\ge j-1}({y}_{j}{\overline{y}}_{j-1})=\frac{p(S=j-1)}{p(S\ge j-1)}{p}_{j-1}({y}_{j}{\overline{y}}_{j-1})+\frac{p(S\ge j)}{p(S\ge j-1)}{p}_{\ge j}({y}_{j}{\overline{y}}_{j-1}).$$

Note that by this approach, although the model for future missing data is a mixture of normals, the sensitivity parameters are kept the same as in Section 3 (
${\mathrm{\Delta}}_{l}^{(j)}$ and
${\mathrm{\Delta}}_{\sigma}^{(j)}$, *j* = 2, …, *J* and *l* = 0, …, *j* − 1). In addition, this significantly reduces the number of potential sensitivity parameters. For *J*-dimensional longitudinal data, the total number of sensitivity parameters, (2*J*^{3} + 3*J*^{2} + *J*)/6 − *J* is reduced to (*J*^{2} + 3*J* − 4)/2; for J=3 (6), from 11 (85) to 7 (25). Further reduction is typically needed. See the data example in Section 6 as an illustration. If all of the remaining sensitivity parameters are set to zero, we have *p _{s}*(

If we assume multivariate normality only on observed data response, *Y*_{obs}|*S* instead of the full data response, ** Y**|

For example, we may specify distributions *Y*_{obs}|*S* as follows:

$$\begin{array}{c}{p}_{s}({y}_{1})~\text{N}({\mu}_{1}^{(s)},{\sigma}_{1}^{(s)})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}1\le s\le J\\ {p}_{s}({y}_{j}{\overline{y}}_{j-1})~\text{N}({\mu}_{j{j}^{-}(s)}^{,}\end{array}$$

where

$${\mu}_{j{j}^{-}(s)}^{=}$$

To identify the full-data model, recall the MAR constraints imply that

$${p}_{s}({y}_{j}{\overline{y}}_{j-1})={p}_{\ge j}({y}_{j}{\overline{y}}_{j-1})=\sum _{k=j}^{J}\frac{P(S=k)}{P(S\ge j)}{p}_{k}({y}_{j}{\overline{y}}_{j-1})$$

(3)

for *s* < *j*, which are mixture of normals. For sensitivity analysis in this setting of mixture of normals, we propose to introduce sensitivity parameters Δ* _{μ}* (location) and Δ

$${p}_{s}({y}_{j}{\overline{y}}_{j-1})={e}^{-{\mathrm{\Delta}}_{\sigma}^{(j)}}\sum _{k=j}^{J}{j,k}_{{p}_{k}}$$

(4)

where
${j,k}_{=}$. The rationale for this parameterization is that each *p _{k}*(·|

In this set up, we have

$${\mu}_{jj-(s),\text{MNAR}}^{=}$$

and

where

(see Web Appendix C for details). Note that does not depend on
${\sigma}_{jj-1(k)}^{}$ for *k* = *j*, …, *J*.

Under an MAR assumption (3), for [*Y _{j}*|

$${\mu}_{jj-(s),\text{MAR}}^{=}$$

and

$${\sigma}_{jj-(s),\text{MAR}}^{=}$$

Therefore, under MNAR assumption (4), the two sensitivity parameters control the departure of the mean and variance from MAR in the following way,

with
${\mathrm{\Delta}}_{\mu}^{(j)}$ being a location parameter and
${\mathrm{\Delta}}_{\sigma}^{(j)}$ being a scale parameter. The MNAR class allows MAR when
${\mathrm{\Delta}}_{\mu}^{(j)}={\mathrm{\Delta}}_{\sigma}^{(j)}=0$ for all *j* ≥ 2.

By assuming non-future dependence, we obtain

$${p}_{s}({y}_{j}{\overline{y}}_{j-1})={p}_{\ge j-1}({y}_{j}{\overline{y}}_{j-1})=\frac{p(S=j-1)}{p(S\ge j-1)}{e}^{-{\mathrm{\Delta}}_{\sigma}^{(j)}}\sum _{k=j}^{J}{j,k}_{{p}_{k}}$$

for the future data and (4) for the current data (*j* = *s* + 1). The number of sensitivity parameters in this setup is reduced from *J*(*J* −1) by (*J* −2)(*J* −1) to 2(*J* −1); so, for *J* = 3 (6), from 6 (30) to 2 (20). Further reductions are illustrated in Section 6.

We analyze a longitudinal clinical trial using the framework from Sections 4 and 5 that assume multivariate normality for the full-data response within pattern (MVN) or multivariate normality for the observed data response within pattern (OMVN). We assume non-future dependence for the missing data mechanism to minimize the number of sensitivity parameters.

The growth hormone (GH) trial was a randomized clinical trial conducted to estimate the effectiveness of recombinant human growth hormone therapy for increasing muscle strength in the elderly. The trial had four treatment arms: placebo (P), growth hormone only (G), exercise plus placebo (EP), and exercise plus growth hormone (EG). Muscle strength, here mean quadriceps strength (QS), measured as the maximum foot-pounds of torque that can be exerted against resistance provided by a mechanical device, was measured at baseline, 6 months and 12 months. There were 161 participants enrolled on this study, but only (roughly) 75% of them completed the 12 month follow up. Researchers believed that dropout was related to the unobserved strength measures at the dropout times.

For illustration, we confine our attention to the two arms using exercise: exercise plus growth hormone (EG) and exercise plus placebo (EP). Table 1 contains the observed data for the two arms.

Let (*Y*_{1}, *Y*_{2}, *Y*_{3}) denote the full-data response corresponding to baseline, 6 months, and 12 months. Let *Z* be the treatment indicator (1 = EG, 0 = EP). Our goal is to draw inference about the mean difference of QS between the two treatment arms at month 12. That is, the treatment effect *θ* = *E*(*Y*_{3}|*Z* = 1) − *E*(*Y*_{3}|*Z* = 0). In the full-data model for each treatment under non-future dependence, there are seven sensitivity parameters for the MVN model: {
${\mathrm{\Delta}}_{0}^{(2)},{\mathrm{\Delta}}_{1}^{(2)},{\mathrm{\Delta}}_{0}^{(3)},{\mathrm{\Delta}}_{1}^{(3)},{\mathrm{\Delta}}_{2}^{(3)},{\mathrm{\Delta}}_{\sigma}^{(2)},{\mathrm{\Delta}}_{\sigma}^{(3)}$}, and four sensitivity parameters for OMVN model: {
${\mathrm{\Delta}}_{\mu}^{(2)},{\mathrm{\Delta}}_{\mu}^{(3)},{\mathrm{\Delta}}_{\sigma}^{(2)},{\mathrm{\Delta}}_{\sigma}^{(3)}$}; see Web Appendix D for details on the models. For the MNAR analysis, we reduced the number of sensitivity parameters as follows:

- ${\mathrm{\Delta}}_{\sigma}^{(2)}$ and ${\mathrm{\Delta}}_{\sigma}^{(3)}$ do not appear in the posterior distribution of
*E*(*Y*_{3}|*Z*) for*Z*= 0, 1, and thus are not necessary for inference on*θ*. - We restrict to MNAR departures from MAR in terms of the intercept terms by assuming ${\mathrm{\Delta}}_{1}^{(2)}={\mathrm{\Delta}}_{1}^{(3)}={\mathrm{\Delta}}_{2}^{(3)}0$.
- We assume the sensitivity parameters are identical between treatments.

This reduces the set of sensitivity parameters to { ${\mathrm{\Delta}}_{0}^{(2)},{\mathrm{\Delta}}_{0}^{(3)}$} for MVN model and { ${\mathrm{\Delta}}_{\mu}^{(2)},{\mathrm{\Delta}}_{\mu}^{(3)}$} for the OMVN model.

There are a variety of ways to specify priors for the sensitivity parameters ${\mathrm{\Delta}}_{0}^{(2)}$ and ${\mathrm{\Delta}}_{0}^{(3)}$,

$$\begin{array}{l}{\mathrm{\Delta}}_{0}^{(2)}=E({Y}_{2}{Y}_{1},S=1)-E({Y}_{2}{Y}_{1},S\ge 2){\mathrm{\Delta}}_{0}^{(3)}=E({Y}_{3}{Y}_{2},{Y}_{1},S=2)-E({Y}_{3}{Y}_{2},{Y}_{1},S=3).\end{array}$$

Both represent the difference of conditional means between the observed and unobserved responses. ${\mathrm{\Delta}}_{\mu}^{(2)}$ and ${\mathrm{\Delta}}_{\mu}^{(3)}$ have (roughly) the same interpretation as ${\mathrm{\Delta}}_{0}^{(2)}$ and ${\mathrm{\Delta}}_{0}^{(3)}$.

Based on discussion with investigators, we made the assumption that dropouts do worse than completers; thus, we restrict the Δ’s to be less than or equal to zero. To do a fully Bayesian analysis to fairly characterize the uncertainty associated with the missing data mechanism, we assume a uniform prior for the Δ’s as a default choice. Subject matter considerations gave an upper bound of zero for the uniform distributions. We set the lower bound using the variability of the observed data as follows. We estimate the residual variances of *Y*_{2}|*Y*_{1} and *Y*_{3}|*Y*_{2}, *Y*_{1} using the observed data; we denote these by *τ*_{2|1} and *τ*_{3|2,1} respectively. We use the square root of these estimates as the lower bounds. In particular, we specify the priors for {
${\mathrm{\Delta}}_{0}^{(2)},{\mathrm{\Delta}}_{0}^{(3)}$} as well as {
${\mathrm{\Delta}}_{\mu}^{(2)},{\mathrm{\Delta}}_{\mu}^{(3)}$} as Unif((** τ**)), where

(5)

Based on the estimates
${\tau}_{21-1/2}^{=}$ and
${\tau}_{32,1-1/2}^{=}$, the priors are [−18, 0] × [−12, 0] for {
${\mathrm{\Delta}}_{0}^{(2)},{\mathrm{\Delta}}_{0}^{(3)}$} and for {
${\mathrm{\Delta}}_{\mu}^{(2)},{\mathrm{\Delta}}_{\mu}^{(3)}$}. For the other parameters in the full-data model, we assign *N*(0, 10^{6}) for mean parameters (*μ*, *β*) and *Unif*(0, 100) for variance parameters (*σ*^{1/2}).

We fit the model using WinBUGS, with multiple chains of 25, 000 iterations and 4000 burn-in. Convergence was checked by examining trace plots of the multiple chains.

The results of the MVN and OMVN models are given in Table 2. Under MNAR, the posterior mean (posterior standard deviation) of the difference in quadriceps strength at 12 months between the two treatment arms was 4.0 (8.9) and 4.4 (10) for the MVN and OMVN models. Under MAR the differences were 5.4 (8.8) and 5.8 (9.9) for the MVN and OMVN models, respectively. The smaller differences under MNAR were due to quadriceps strength at 12 months being lower under MNAR due to the assumption that dropouts do worse than completers.

To compare the fit of models OMVN and MVN, we used the deviance information criterion that is based on the observed data likelihood (DIC* _{O}*),

$${\text{DIC}}_{O}=-4E\{logL(\theta {\mathit{y}}_{\text{obs}},S)\}+2logL\{E(\theta {\mathit{y}}_{\text{obs}},S){\mathit{y}}_{\text{obs}},S\}.$$

(6)

The results favored the OMVN model (Table 2). The behavior of DIC* _{O}* for model selection in the setting of incomplete data is explored in Section 7. Note that given the OMVN or MVN specification for the observed data, the fit (as measured by the DIC

To assess the sensitivity of the informative priors Unif((** τ**)), we evaluated several scenarios by making the priors more or less informative by modifying the range (see Table 3). Specifically, we considered (1) (

Although the specification in Section 3–4 arguably offers a simpler sensitivity analysis than that in Section 5, the former may not fit the observed data as well (as indicated by DIC* _{O}* in Section 6). In addition, if the conclusions between OMVN and MVN were substantially different (unlike in the example here), we would need to decide which model to use for inference.

Spiegelhalter et al. (2002) developed the deviance information criterion (DIC) for Bayesian model comparison. In the setting of incomplete data, many alternative representations of DIC have been proposed (Celeux et al., 2006; Daniels and Hogan, 2008). The underlying complication is the fact that we do not observe the full data. Recommendations in Celeux et al. (2006) were based on missing data that were actually latent variables in random effects and mixture models, not potentially observable missing data as is our focus. Daniels and Hogan (2008) recommended DIC* _{O}* (6) on intuitive grounds but did no exploration of its operating characteristics. The following simulation study explores the behavior of DIC

We simulated observed data from the MVN and OMVN models. Parameters estimated by fitting the MVN and OMVN models to the observed EG arm data from the growth hormone study were the basis for the simulation (see Table 4).

For each sample size, 50, 150 and 300, we simulated 500 datasets for MVN and OMVN observed data models. The performance of the different approaches were assessed using both the DIC* _{O}* and the mean squared error (MSE) of

Simulation Results: Comparison of MSE and performance of DIC_{O} in MVN and OMVN models. The columns correspond to the models fit and the rows to the scenario under which the data was generated.

The simulations favored the correctly specified model with respect to both DIC* _{O}* and MSE. As the sample size was increased, the MSE’s decreased and the difference between the correct and the incorrect models increased. The MSE for

To compare the robustness of the MVN and OMVN models when neither is correct, we considered selection models where the distribution of ** Y** is heavy-tailed or asymmetric. Specifically, we simulated response data from a multivariate

The average DIC* _{O}* are reported in Table 6. Note that MAR and MNAR cases correspond to different observed data models in this simulation setting. Therefore, we report DIC

Simulation Results: Assessing fit of MVN and OMVN models using DIC_{O} under heavy-tailed and skewed data generating mechanisms. Values in the table are DIC_{O}(rate in favor). Rate in favor corresponds to the percentage of simulated datasets where the model **...**

From the simulation, we can see that MVN model is more likely to be selected by DIC* _{O}* when sample size is small. When sample size is moderate or large, OMVN model fits the observed data better for all the cases considered and offers more robustness (in particular for the heavy-tailed scenario). OMVN appears to be the better default choice for medium to large sample sizes.

In this section, we show that common identifying restrictions over-identify estimable parameters in the presence of baseline covariates with time invariant coefficients and offer a solution.

Consider the situation when ** Y** = (

$$\begin{array}{l}SX~\text{Bern}(\mathrm{(X))}\mathit{Y}S=s~\text{N}({\mathit{\mu}}^{(s)}(X),{\mathbf{\sum}}^{(s)})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}s=1,2\end{array}$$

where

$${\mathit{\mu}}^{(s)}=\left[\begin{array}{c}{\mu}_{1}^{(s)}+X{\alpha}^{(s)}\\ {\mu}_{2}^{(s)}+X{\alpha}^{(s)}\end{array}\right]\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\mathbf{\sum}}^{(s)}=\left[\begin{array}{cc}{\sigma}_{11}^{(s)}& {\sigma}_{12}^{(s)}\\ {\sigma}_{21}^{(s)}& {\sigma}_{22}^{(s)}\end{array}\right].$$

MAR (ACMV) implies the following restriction

$$[{Y}_{2}{Y}_{1},S=1][{Y}_{2}{Y}_{1},S=2].$$

This implies that conditional means, *E*(*Y*_{2}|*Y*_{1}, *X*, *S* = *s*) for *s* = 1, 2, are equal, i.e.

$${\mu}_{2}^{(1)}+X{\alpha}^{(1)}+\frac{{\sigma}_{21}^{(1)}}{{\sigma}_{11}^{(1)}}({Y}_{1}-{\mu}_{1}^{(1)}-X{\alpha}^{(1)})={\mu}_{2}^{(2)}+X{\alpha}^{(2)}+\frac{{\sigma}_{21}^{(2)}}{{\sigma}_{11}^{(2)}}({Y}_{1}-{\mu}_{1}^{(2)}-X{\alpha}^{(2)}).$$

(7)

For (7) to hold for all *Y*_{1} and *X*, we need that

$${\alpha}^{(1)}={\alpha}^{(2)}.$$

However, both *α*^{(1)} and *α*^{(2)} are already identified by the observed data *Y*_{1}. Thus the ACMV (MAR) restriction affects the model fit to the observed data. This is against the principle of applying identifying restrictions (Little and Wang, 1996; Daniels and Wang, 2009).

To resolve the over-identification issue, we propose to apply MAR constraints on residuals instead of directly on the responses. In the bivariate case, the corresponding restriction is

$$[{Y}_{2}-X{\alpha}^{(1)}{Y}_{1}-X{\alpha}^{(1)},X,S=1][{Y}_{2}-X{\alpha}^{(2)}{Y}_{1}-X{\alpha}^{(2)},X,S=2].$$

(8)

Since the conditional distributions

$$[{Y}_{2}-X{\alpha}^{(s)}{Y}_{1}-X{\alpha}^{(s)},X,S=s]~\text{N}\left({\mu}_{2}^{(s)}+\frac{{\sigma}_{21}^{(s)}}{{\sigma}_{11}^{(s)}}({Y}_{1}-{\mu}_{1}^{(s)}),{\sigma}_{22}^{(s)}-\frac{{({\sigma}_{21}^{(s)})}^{2}}{{\sigma}_{11}^{(s)}}\right)$$

are independent of *α*^{(}^{s}^{)} for *s* = 1, 2, the restriction (8) places no constraints on *α*^{(}^{s}^{)}, thus avoiding over-identification.

The MDM corresponding to the ACMV(MAR) on the residuals is given by

$$log\frac{P(S=1\mathit{Y},X)P(S=2\mathit{Y},X)=log\frac{\mathrm{(X)}1-X-\frac{1}{2{\sigma}^{}}}{}}{}$$

where
${\sigma}^{}$ and
$\mathrm{\Delta}({Y}_{1})={\mu}_{2}^{(2)}+{\scriptstyle \frac{{\sigma}_{21}^{(2)}}{{\sigma}_{11}^{(2)}}}({Y}_{1}-{\mu}_{1}^{(2)})$. Hence by assuming MAR on the residuals, we have the MDM being a quadratic form of *Y*_{1}, but independent of *Y*_{2} if and only if *α*^{(2)} = *α*^{(1)}. In other words, assumption (8) implies MAR if and only if *α*^{(2)} = *α*^{(1)}. So in general, MAR on residuals does not imply that missingness in *Y*_{2} is MAR. However, it is an identifying restriction that does not impact the fit of the model to the observed data. CCMV and NCMV restrictions can be applied similarly to the residuals.

In general, ${\mu}_{l}^{(s)}$ can be replaced by ${\mu}_{il}^{(s)}$ if there are subject-specific covariates with time varying coefficients.

The ACMV (MAR) on the residuals restriction can be applied to the multivariate case and results in a similar MDM. Detailed discussion is provided in Web Appendix F.

Most pattern mixture models allow the missingness to be MNAR, with MAR as a unique point in the parameter space. The magnitude of departure from MAR can be quantified via a set of sensitivity parameters. For MNAR analysis, it is critical to find scientifically meaningful and dimensionally tractable sensitivity parameters. For this purpose, (multivariate) normal distributions are often found attractive since the MNAR departure from MAR can be parsimoniously defined by deviations in the mean and (co-)variance.

However, a simple pattern mixture model based on multivariate normality for the full data response within patterns does not allow MAR without special restrictions that themselves, induce a very restrictive missing data mechanism. We have explored this fully and proposed alternatives based on multivariate normality for the observed data response within patterns. In both these contexts, we proposed strategies for specifying sensitivity parameters.

The proposed modeling and sensitivity analysis approaches based on within-pattern multivariate normality for the full data response or the observed data response may lead to contradicting study conclusions, due to the different (unverifiable) missing data mechanism assumptions and the different observed data model. For the latter, we proposed to use the deviance information criterion that is based on the observed data likelihood for model comparison and selection and showed via simulations that it appears to perform well. Sensitivity analysis and inferences can then be based upon the model that fits the observed data the best.

In addition, we showed that when introducing baseline covariates with time invariant coefficients, standard identifying restrictions result in over-identification of the model. This is against the principle of applying identifying restriction in that they should not affect the model fit to the observed data. We proposed a simple alternative set of restrictions based on residuals that can be used as an ’identification’ starting point for an analysis using mixture models.

In the growth hormone study data example, we showed how to reduce the number of sensitivity parameters in practice and a default way to construct informative priors for sensitivity parameters based on limited knowledge about the missingness. In particular, all the values in the range were weighted equally via a uniform distribution. If there is additional external information from expert opinion or historical data, informative priors may be used to incorporate such information (for example, see Ibrahim and Chen, 2000; Wang et al., 2010). Finally, an important consideration in sensitivity analysis and constructing informative priors is that they should avoid extrapolating missing values outside of a reasonable range (e.g., in the growth hormone trial example, we would not want to impute a negative quadriceps strength).

Web Appendices and Tables referenced in Sections 2, 3, 5, 6, 7, and 8 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

Chenguang Wang, Division of Biostatistics, Center for Devices and Radiological Health, FDA, Silver Spring, Maryland 20993.

Michael J. Daniels, Department of Statistics, University of Florida, Gainesville, FL 32611.

- Azzalini A, Valle A. The multivariate skew-normal distribution. Biometrika. 1996;83:715–726.
- Celeux G, Forbes F, Robert C, Titterington D. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674.
- Daniels M, Hogan J. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC; 2008.
- Daniels M, Wang C. Discussion of “Missing Data in longitudinal studies: A review” by Ibrahim and Molenberghs. TEST. 2009;18:51–58. [PMC free article] [PubMed]
- Harel O, Schafer J. Partial and latent ignorability in missing-data problems. Biometrika. 2009;96:37.
- Hogan J, Laird N. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997;16:259–272. [PubMed]
- Ibrahim J, Chen M. Power prior distributions for regression models. Statistical Science. 2000;15:46–60.
- Kenward M, Molenberghs G. Parametric Models for Incomplete Continuous and Categorical Longitudinal Data. Statistical Methods in Medical Research. 1999;8:51. [PubMed]
- Kenward M, Molenberghs G, Thijs H. Pattern-mixture models with proper time dependence. Biometrika. 2003;90:53–71.
- Little R. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90
- Little R, Rubin D. Statistical Analysis with Missing Data. Wiley; 1987.
- Little R, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed]
- Molenberghs G, Kenward M. Missing Data in Clinical Studies. Wiley; 2007.
- Molenberghs G, Michiels B, Kenward M, Diggle P. Monotone Missing Data and Pattern-Mixture Models. Statistica Neerlandica. 1998;52:153–161.
- Rubin D. Inference and missing data. Biometrika. 1976;63:581–592.
- Scharfstein D, Daniels M, Robins J. Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes. Biostatistics. 2003;4:495. [PMC free article] [PubMed]
- Scharfstein D, Rotnitzky A, Robins J. Adjusting for Nonignorable DropOut Using Semiparametric Nonresponse Models. Journal of the American Statistical Association. 1999;94:1096–1146.
- Spiegelhalter D, Best N, Carlin B, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583–639.
- Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3:245. [PubMed]
- Vansteelandt S, Goetghebeur E, Kenward M, Molenberghs G. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica. 2006;16:953–979.
- Wang C, Daniels M, Scharfstein D, Land S. A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. Journal of the American Statistical Association. 2010 (in press) [PMC free article] [PubMed]
- Zhang J, Heitjan D. A simple local sensitivity analysis tool for nonignorable coarsening: application to dependent censoring. Biometrics. 2006;62:1260–1268. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |