6.1 Parametric selection model analysis

One approach that has been proposed in the statistical literature is to specify parametric models for both the conditional distributional of

*R* given

*Y* and the marginal distribution of

*Y* (

Diggle and Kenward, 1994). For ACTG 175, it might be assumed that

*Y* is log-normal with parameter vector

**θ** = (

*γ, τ*^{2}) and the conditional distribution of

*R* given

*Y* follows

(3) with

*q*(

*Y*) =

*α* log(

*Y*). Due to the log-normal assumption,

*α* is identified. Inference could then proceed by maximum likelihood. By letting

*df* be very large and assuming relatively non-informative priors on

*α, η*, and

**θ**, we know that the posterior distributions of

*α* and

*μ* = exp(

*γ* +

*τ*^{2}/2) will approximate the marginal distributions of their maximum likelihood estimators. We performed such an analysis by letting

*df* = 10 000,

*π*(

*α*) ~

*N* (0, 10.0),

*π*(

*η*) ~

*N*(0, 100),

*π*(

*γ*) ~

*N*(5.5, 1.0) and

*π*(1/

*τ*^{2}) ~

*G*(1, 10), where

*N*(

*a, b*) denotes a normal distribution with mean

*a* and variance

*b* and

*G*(

*c, d*) denotes a gamma distribution with scale

*c*, shape

*d*, mean

*cd* and variance

*cd*^{2}. The prior mean for

*γ* was chosen based on the observed overall mean of log CD4 cells at baseline and the fact that the overall health of the cohort was expected to decline over the study period; the prior mean of

*τ*^{2} was set equal to the inverse of the overall variance of log CD4 cells at baseline; the prior variances of

*γ* and

*τ*^{2} were chosen large enough so that the data and modeling assumptions would ultimately determine the posterior distributions of

*γ* and

*τ*^{2}.

In , we present the treatment-specific posterior distributions of *α* and *μ* (dashed lines), based on *K* = 10 000 iterations (first 1000 iterations discarded; overall acceptance rate in Step 2 of the algorithm was 72.6% and 74.8% in the AZT+ddI and ddI arms, respectively). The approximate maximum likelihood estimates (standard errors; 95% credible intervals) for *α* are –2.58 (0.24; [–3.00, –2.09]) and –2.76 (0.26; [–3.25, –2.22]) for the AZT+ddI and ddI arms, respectively. Under the log-normality assumption, we would reject the treatment-specific null hypothesis of missing at random. The corresponding estimates for *μ* are 303.42 (13.63;[277.67,331.20]) and 296.68 (13.51;[271.49,324.17]). The posterior distribution of the difference between the means in the AZT+ddI and ddI arms is displayed in . The approximate maximum likelihood estimate of the difference is 6.74 (19.27; [–30.89, 43.92]), suggesting no significant treatment effect. To check the model fit, we use the posterior predictive checking approach described in Section 5.5. The first row of displays the empirical distribution of the observed CD4 counts at week 56 (solid line) along with 100 empirical distribution functions of observed outcomes drawn from its posterior predictive distribution. As the the figure illustrates, the model does not fit the observed data well. To achieve a better fit, we tried two alternative models for the marginal distribution of *Y*. In particular, we let the cube root and square root of *Y* be normally distributed. These models fit slightly better than the log-normal, but still did not fit the observed data well. Under these alternative models, the approximate maximum likelihood estimates and confidence intervals for *α* were of similar order of magnitude as the log-normal model.

6.2 Frequentist non-parametric sensitivity analysis

When the distribution of the CD4 count at week 56 is left unspecified, we saw in Section 4 that *α* is not identified. In the absence of such distributional information, the best that can be achieved from a frequentist perspective is to perform a sensitivity analysis. In , we present the treatment-specific estimated overall mean CD4 count at week 56 (with 95% confidence intervals) as a smooth function of the selection bias parameter *α* (*α* ranges from –2.0 to 2.0). Note how the sampling variability (as indicated by the width of the confidence intervals) is dominated by the uncertainty in *α*. That is, the width of any given confidence interval is approximately 40–50 CD4 cell counts, while the range of estimated means across *α* is approximately 200–300 CD4 cell counts. Common statistical practice is to simply report one of these confidence intervals which can grossly misrepresent treatment efficacy.

In , we present a contour plot of the *Z*-statistic (estimated difference in means divided by standard error of the difference) associated with the test of the null hypothesis of no treatment difference as a function of treatment-specific selection bias parameters. On the horizontal (vertical) axis, we vary the selection bias parameter for the AZT+ddI (ddI) arm. Regions marked with a treatment label indicate that, for selection bias parameter combinations in the region, a 0.05 level test of the null hypothesis would be rejected in favor of that treatment. The solid point in the contour plot indicates the result from the missing at random assumption in both treatment groups. The conclusion from this contour plot is that with mild levels of differential selection bias in the treatment arms (e.g. *α* = 0 in the ddI arm and *α* = –0.025 in the AZT+ddI arm), we would change the conclusion based on the default analysis. As a result, the evidence in favor of AZT+ddI appears to be ‘weaker’ than that based on missing at random. As we see, this sensitivity analysis may be viewed as limited in its use since it does not provide an overall quantification of the strength of evidence, accounting for uncertainty in beliefs about selection bias.

6.3 Flexible Bayesian analysis

When a decision is required, the flexible Bayesian methodology described in Section 5 can be used to summarize the treatment efficacy in the presence of the uncertainly regarding the distribution of the outcome and the level of selection bias.

For each treatment group, we assumed that the distribution of *Y, F*, followed a Dirichlet process mixture prior with precision *df* = 1, and log-normal base measure with parameter vector *θ* = (*γ, τ*^{2}). We assumed that the hyper-priors for *γ* and *τ*^{2} were independent. In particular, we let *π*(*γ*) ~ *N*(5.5, 1), *π*(1/*/τ*^{2}) ~ Γ (1, 10). In addition, we assumed an independent, non-informative, normal prior on *η*, i.e. *π*(*η*) ~ *N*(0, 100). For *α*, we use *π*(*α*) ~ *N*(–0.5, 0.25^{2}). That is, the prior belief is that subjects with lower CD4 counts (under full compliance) at week 56 are more likely to drop out. Specifically, the prior states that there is a 95% chance that the odds ratio of drop-out for subjects with a two-fold change in CD4 cells at week 56 is between 1 and 2, with a most probable value around 1.4.

displays the treatment-specific posterior distributions of *α* and *μ* (solid lines), based on *K* = 10 000 iterations (first 1000 iterations discarded; acceptance rate in Step 2 of the algorithm was 80.0% and 80.3% in the AZT+ddI and ddI arms, respectively). The prior distribution of *α* is the dotted line and demonstrates the difference between the prior and posteriors. The posterior means (95% credible intervals) for *α* are –0.50 ([–0.90, –0.14]) and –0.49 ([–0.83, –0.15]) for the AZT+ddI and ddI arms, respectively. For comparison, the prior mean (95% credible interval) for *α* was –0.50 ([–1.0, 0.0]) for both treatment arms. The treatment-specific posterior distributions of *α* are tighter than the prior distributions, indicating a weak level of induced *a priori* dependence between *α* and (*p, F*_{1}). The posterior means (95% credible intervals) for *μ* are 368.24 ([342.43, 390.57]) and 348.00 ([330.40, 365.40]) for the AZT+ddI and ddI arms, respectively. displays the posterior distribution of the difference between *μ*(*AZT* + *ddI*) and *μ*(*DDI*) (solid line). The posterior mean (95% credible interval) is 20.24 ([–11.12, 48.63]). The posterior probability that *μ*(*AZT* + *ddI*) is greater than *μ*(*DDI*) is 90.76%. After accounting for prior beliefs regarding selection bias, there appears to be relatively strong evidence in favor of combination therapy.

In , we display the treatment-specific convergence diagnostic described in Section 5.4 for *α* = –0.8, –0.65, –0.5, –0.35, –0.2. The solid line is the estimated distribution of *π*(*μ*|**O**, *α*) based on our non-parametric maximum likelihood results of Section 4 and the dotted line is the estimated distribution based on the Gibbs sampling scheme. The two estimates are quite close, providing support for convergence. The second row of displays the empirical distribution of the observed CD4 counts at week 56 (solid line) along with 100 empirical distribution functions of observed outcomes drawn from its posterior predictive distribution. As the the figure illustrates, the model, as expected, fits the observed data well.

6.4 Summary and comparison of approaches

In evaluating treatment effects in the presence of missing data, the analyst usually starts with the default missing at random analysis. Under missing at random, the estimated means are 385 and 360 in the AZT+ddI and ddI arms, respectively. The difference in means is 25 CD4 cells and the null hypothesis of no treatment difference is rejected.

Recognizing that missing at random is likely to fail, the analyst might consider a parametric selection model analysis, similar to the one conducted in Section 6.1. There are two important lessons to be learned from this analysis. The first lesson is that model selection is difficult and model checking is critical. In our analysis, we found that some of the models typically used to fit CD4 count data did not fit the observed data well. To achieve better fits, one needs to either use a more flexible model for the distribution of *Y* or fit amore flexible form for the selection bias function *q*(*Y*). The second lesson is that the distributional form of *Y* can determine the magnitude of selection bias and one must make sure that the level of selection bias is substantively plausible. Using the log-normal assumption, the estimated value of *α* is –2.58 and –2.76 in the AZT+ddI and ddI arms, respectively. Missing at random is rejected in both treatment arms. These levels of selection bias are enormous; the imputed distribution of *Y* among drop-outs is more extreme than the left-most histograms in . They correspond to the belief that for two subjects who have a two-fold difference in CD4 cells at week 56, the sicker subject is 6.0 and 6.7 times as likely to drop out in the AZT+ddI and ddI arms, respectively. These levels of selection bias are highly implausible. With these levels, the estimated means are 303 and 297 for the AZT+ddI and ddI arms. This is a huge reduction from the estimated means under missing at random. The difference in means is 6.7 CD4 cells and is not statistically significant. Similar results were observed in the alternative models we considered. While one can argue that these levels of selection bias are due to ill-fitting models, we conjecture that one can posit a more flexible model for the marginal distribution of *Y* which fits the data well but nevertheless still identifies a highly implausible level of selection bias.

The frequentist non-parametric analysis in Section 6.2 suggested that evidence in favor of AZT+ddI is weaker than that provided by the missing at random analysis. However, the drawback of the sensitivity analysis is that a single answer is not provided and the level of uncertainty is not quantified. Using treatment-specific informative priors on *α*, the flexible Bayesian analysis of Section 6.3 provides a quantification of uncertainty through posterior distributions. In this analysis, the treatment-specific distributions for *α* are slightly narrower than the prior specifications, indicating that, within the context of our fully Bayesian model, the data provide relatively little information about *α*. The estimated means are 368 and 348 in the AZT+ddI and ddI arms, which are, as expected, lower than the missing at random means, and more plausible than those from the fully parametric analysis. The posterior distributions (solid lines in ) indicate the degree of uncertainty regarding the treatment-specific means. The degree of uncertainty is comparable to that provided by parametric analysis and much larger than that of the missing at random analysis. The estimated mean difference is 20 CD4 cells and the posterior distribution for the difference (solid line in ) indicates the level of uncertainty as reflected by the span of the 95% credible interval and the 91% chance that AZT+ddI is superior to ddI. This is a concise representation of the strength of evidence regarding the treatment effect. As this analysis incorporates prior beliefs about selection bias and fits the observed data well, it may be more plausible than the missing at random analysis.