Growth modeling analyses based on maximum-likelihood estimation typically use the assumption of missing at random (MAR; Little & Rubin, 2002
). This covers situations where dropout is predicted by previously observed outcome values. Missing data due to dropout may not, however, fulfill the MAR assumption but may call for missing data techniques that handle non-ignorable missingness, sometimes referred to as not missing at random (NMAR). NMAR arises if unobserved variables that are correlated with the outcome predict missingness, such as a high or a low outcome value that is not recorded because the subject drops out. NMAR situation can be handled by using a “full-data” likelihood analysis which considers as data not only the outcomes but also 0/1 missing data indicators for each time point (see, e.g. Little, 2009
). Consider the full-data likelihood in symbolic form where y
refers to the outcome vector and m
refers to the binary missing data indicators,
where the first expression refers to the the “pattern-mixture modeling” approach (see, e.g., Little, 1995
) and the second expression to the “selection modeling” approach (see, e.g., Diggle & Kenward, 1994
). “Shared-parameter modeling” considers the likelihood factorization
represents latent class variables influencing both the outcomes y
and the missingness indicators m
. Shared-parameter modeling may also use latent variables in the form of random effects, replacing the sum in (3)
with integrals. It should be noted that each NMAR model involves untestable assumptions due to missing data. It is therefore important to compare results from several different models to achieve a sensitivity analysis. Recent overviews of NMAR modeling are given in Albert and Follman (2009)
and Little (2009)
. In longitudinal studies, the non-ignorability concern is typically focused on missing data in the form of dropout, not intermittent missingness. This is the focus also of this study.
Pattern-mixture modeling (see, e.g., Little, 1995
; Hedeker & Gibbons, 1997
; Demirtas & Schafer, 2003
) considers the likelihood factorization [y, d
] = [y
], where the d
variables can be represented by dummy variables for dropout occasion. A simple version of the model allows the random effect means to vary as a function of the dropout dummy variables. A quadratic growth model is used for STAR*D.
To facilitate understanding and comparisons of the alternative missing data models, the statistical description of the modeling is complemented by model diagrams. The pattern-mixture model is shown in model diagram form as in , where squares represent observed variables and circles latent variables. Here, y0 – y5 represent the depression outcomes at baseline and through week 12, whereas i, s, and q represent the random intercept, slope, and quadratic slope. Note that these model diagrams are not geared towards causal inference. Single-headed arrows simply represent regression relationships and double-headed arrow represent correlations. The goal of the modeling is not to draw inference on causal effects, but to understand important sources of variation in the depression outcomes over time.
Pattern-mixture modeling (d’s are dropout dummy variables)
The y0 – y5 outcomes show a total of 34 missing data patterns in the total sample of n = 4041. The dummy dropout indicators dt are defined as dt = 1 for a subject who drops out after time t − 1 (t = 1, 2, … 5) for the six time points. In the STAR*D data, the frequencies of dt = 1 are 420 (d1), 299 (d2), 301 (d3), 484 (d4), 983 (d5). There are 995 subjects who have all d’s equal to zero, that is, do not drop out. The five dummy dropout indicators thereby define six subgroups of subjects. Intermittent missingness is observed for 559 subjects. These subjects are spread among five of the six subgroups, excluding the d1 = 1 subgroup. Missing data is recorded for each subject and outcome variable. This implies that within each of the six subgroups, a subject with intermittent missingness is treated the same as a subject with complete data up to that point.
The pattern-mixture model typically needs restrictions on the parameters across dropout patterns. For example, with individuals dropping out after the first time point, the linear and quadratic slope means are not identified and for individuals dropping out after the second time point, the quadratic slope mean is not identified. In the current application of pattern-mixture modeling, these means are held equal to those of the pattern corresponding to dropping out one time point later. The random effect mean estimates are mixed over the patterns. This mixture can then be compared to the conventional single-class model estimated under MAR. The resulting estimated mean curve is shown in . It is seen that the estimated mean QIDS depression score at week 12 is somewhat higher using pattern-mixture modeling than using MAR, as would be expected if dropouts have higher QIDS score. The week 12 QIDS standard deviation is 5.3 so that the difference is approximately half a standard deviation.
Depression mean curves estimated under MAR, pattern-mixture (PM), and Diggle-Kenward selection modeling (DK)
Selection modeling uses the likelihood factorization [y, d
] = [y
], where the ds
are survival indicators. An often cited model for selection modeling is the one proposed by Diggle and Kenward (1994)
. A common form of the Diggle-Kenward selection model assumes the logistic regression model for dropout,
where the d
variables are scored as discrete-time survival indicators, obtaining the value 0 for time periods before the dropout event occurs, 1 at the time period the dropout occurs, and missing for the time periods after the event occurs (Muthén & Masyn, 2005
). Here, yti
is missing for an individual who has dti
= 1, that is, drops out after t
− 1. According to this model, MAR holds if β1
= 0, that is, dropout is a function of the last observed y
value, not the current latent y
To complete the model, a quadratic growth curve is used as before for the STAR*D data. The model is shown in diagram form in . Circles within squares represent variables that are not observed for the subjects who drop out.
Diggle-Kenward selection modeling (d’s are survival indicators)
Applying the Diggle-Kenward model to the STAR*D data shows a significant positive maximum-likelihood (ML) estimate of β1. The significance of β1 suggests that NMAR modeling is of interest. The estimated mean depression curve for the Diggle-Kenward model is seen in , showing a trajectory similar to the pattern-mixture model.
It should be noted that MAR, pattern-mixture, and Diggle-Kenward selection modeling use different assumptions. It is instructive to consider the Diggle-Kenward model. If the model is correctly specified, β1
≠ 0 in (4)
may be viewed as an indication of NMAR. This, however, relies on the Diggle-Kenward model’s untestable assumptions about the selection process (4)
and the normality assumptions which are made for unobserved variables. As stated by Little (1994)
in the discussion of the article:
”Consider a single drop-out time, and let Y1 denote the (fully observed) variables up to drop-out and Y2 the (incompletely observed) variables after drop-out. The data clearly supply no direct information about the distribution of Y2 given Y1 for subjects who drop out. — differences in the distribution of Y2 given Y1 for those who do and do not drop out are solely determined by distributional assumptions of the model, such as the form of the model for drop-outs, normality, or constraints on the mean and covariance matrix.”
Considering ”the distribution of Y2
for those who do and do not drop out” implies a conditioning on dropout, which means that the assumption of logistic regression for dropout is involved. Consequently, β1
≠ 0 cannot be seen as a test rejecting MAR because assumptions not included in MAR are added in the Diggle-Kenward model.
MAR also makes normality assumptions for the outcomes which are untestable for the outcomes after dropout. But, the MAR assumptions do not involve a logistic regression for dropout. Pattern-mixture also makes normality assumptions for the outcomes, but the normality is conditional on dropout patterns. All in all, this implies that none of the models is a special case of the other. The models should all be applied and results compared.