Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2748253

Formats

Article sections

- Summary
- 1. Introduction
- 2. ACTG 175
- 3. Data Structure And Notation
- 4. Frequentist Inference
- 5. Bayesian Inference
- 6. Analysis of ACTG 175
- 7. DISCUSSION
- References

Authors

Related links

Biostatistics. Author manuscript; available in PMC 2009 September 22.

Published in final edited form as:

PMCID: PMC2748253

NIHMSID: NIHMS143244

DANIEL O. SCHARFSTEIN, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA ; Email: dscharf/at/jhsph.edu;

The publisher's final edited version of this article is available free at Biostatistics

See other articles in PMC that cite the published article.

In randomized studies with missing outcomes, non-identifiable assumptions are required to hold for valid data analysis. As a result, statisticians have been advocating the use of sensitivity analysis to evaluate the effect of varying asssumptions on study conclusions. While this approach may be useful in assessing the sensitivity of treatment comparisons to missing data assumptions, it may be dissatisfying to some researchers/decision makers because a single summary is not provided. In this paper, we present a fully Bayesian methodology that allows the investigator to draw a ‘single’ conclusion by formally incorporating prior beliefs about non-identifiable, yet interpretable, selection bias parameters. Our Bayesian model provides robustness to prior specification of the distributional form of the continuous outcomes.

In randomized studies with missing outcomes, it is well known that non-identifiable assumptions (e.g. missing at random; Rubin, 1976) are required to hold for valid data analysis. The degree to which these untestable assumptions are believed can have a substantial impact on study conclusions. With this in mind, statisticians have been advocating the use of sensitivity analysis to evaluate the effect of varying assumptions on study conclusions. For example, Rotnitzky *et al*. (1998, 2001), Scharfstein *et al*. (1999), Robins *et al*. (2000) adopted a selection modeling approach; while Rubin (1977), Little (1994) and Daniels and Hogan (2000) used a pattern-mixture formulation. These approaches rely heavily on expert opinions about plausible ranges for non-identifiable, yet interpretable, sensitivity analysis parameters.

While the above methodological developments are useful in assessing the sensitivity of treatment comparisons to missing data assumptions, it may be dissatisfying to some researchers/decision makers because a single summary is not provided. A fully Bayesian analysis allows the investigator to draw a ‘single’ conclusion by formally incorporating prior beliefs about model parameters. For categorical outcomes, Robins *et al*. (1999) and Raab and Donnelly (1999) developed fully Bayesian selection modeling approaches, while Forster and Smith (1998) developed a pattern-mixture approach. For continuous outcomes, Lee and Berger (2001), building on the work of Bayarri and Degroot (1987) and Bayarri and Berger (1998), developed a semiparametric Bayesian selection modeling approach, which places strong distributional assumptions on the outcome and weak distributional assumptions on the selection mechanism. In this paper, we consider the continuous outcome setting but take an opposite tack from Lee and Berger (2001). That is, we place strong prior restrictions on the selection mechanism, but relax the distributional restrictions on the outcome. Our tack is motivated by the fact that, in the clinical trial setting, investigators may have firmer beliefs about the selection mechanism as opposed to the distributional form of the outcome. The flexibility we seek makes the problem challenging. As a result, we restrict ourselves to the setting in which additional covariate information is ignored. By closely examining this scenario, we will gain insight into the more difficult and realistic setting, in which covariate information is utilized. This latter setting will be addressed in a sequel.

The paper is organized as follows. In Section 2, we describe an AIDS clinical trial which will provide context for the methods discussed throughout. In Section 3, we formalize the data structure of the AIDS study. In Section 4, we review the frequentist, non-parametric sensitivity analysis approach of Rotnitzky and colleagues. This review provides a backdrop for our flexible Bayesian approach, developed in Section 5. In Section 6, we analyze the AIDS data from both the frequentist and Bayesian perspective and compare results. Section 7 is devoted to a discussion.

ACTG 175 was a randomized, double-blind trial designed to evaluate nucleoside monotherapy versus combination therapy in HIV-infected individuals with CD4 counts between 200 and 500 mm^{–3}. 2467 subjects were randomized to one of four treatment arms: 619 to AZT (600 mg a day) alone, 613 to AZT (600 mg a day) + ddI (400 mg a day), 615 to AZT (600 mg a day) + ddC (2.25 mg a day), and 620 to ddI (400 mg a day) alone (Hammer *et al*., 1996). CD4 counts were scheduled to be collected at baseline, week 8, and then every 12 weeks thereafter. Additional baseline characteristics were also collected. In the interest of space, we focus attention on the AZT+ddI and ddI treatment arms. Also, we ignore all recorded information except the CD4 count to be measured at week 56.

One goal of the investigators was to compare the treatment-specific distributions of CD4 cell count at week 56 had all subjects remained on their assigned treatment through that week. Thus, it is useful to define a completer as a subject who stays on therapy and is measured at week 56; otherwise, we define the subject as a drop-out. In this paper, we do not distinguish between the multiple causes of drop-out. The percentage of drop-outs in the AZT+ddI and ddI arms is 33.6% and 26.5%, respectively. To address the above objective, a completers-only analysis is usually performed. The mean CD4 count at week 56 for completers (standard error) is 384.96 (8.53) and 359.59 (7.67) in the AZT+ddI and ddI arms, respectively. The difference in means is 25.36 and the associated 95% confidence interval is (2.87, 47.85); a test of the null hypothesis of no treatment difference has an associated *p*-value of 0.027, taken to be evidence of the superiority of AZT+ddI over ddI. The above estimates of the means, under full completion, are only valid if the completers and drop-outs are similar on measured (ignored) and unmeasured characteristics (i.e. missing at random). This latter, non-identifiable assumption is unlikely to hold, as it is well known from other studies that drop-outs tend to be very different than completers. Our goal is to present two alternative and complementary analysis strategies for the ACTG 175 data. The first approach is frequentist, while the second is Bayesian.

We focus on an individual treatment group. We let *Y* denote the CD4 count that would be observed at week 56 under full compliance with assigned therapy. Let *R* be the completion indicator, so that *R* = 1 if the subject is a completer and *R* = 0 if he is a drop-out. Thus, *Y* is observed when *R* = 1 and missing when *R* = 0. Ignoring all other recorded information, we think of *C* = (*R, Y*) and *O* = (*R, Y* : *R* = 1) as the complete and observed data for an individual, respectively. We assume that {*C _{i}* = (

Let *f* be the probability density function (pdf) of *Y, f*_{1} be the conditional pdf of *Y* among completers, and *f*_{0} be the conditional pdf of *Y* among drop-outs. Let *F, F*_{1}, and *F*_{0} be the corresponding cumulative distribution functions (cdf). Let *p* = *P*[*R* = 1]. With this notation, note that the observed data law, *G _{O}*, is characterized by

The goal is to use * O* to draw inference about a functional of

Here, we review the main results of the non-parametric, sensitivity analysis methodology of Rotnitzky and colleagues. Our exposition is intended to provide background and motivation for the Bayesian development in Section 5.

It is well known that the law *G _{O}* of the observed data

One way to identify the distribution of *Y* is to place just enough restrictions on the complete data laws to identify *f*, without restricting the laws of the observed data. Towards this end, Rotnitzky and colleagues assume a relationship between *f*_{0} and *f*_{1}. In particular, they specify a function *q* and postulate that

(1)

It is important to recognize that implicit in the above relationship is the non-identifiable assumption that the support of the distribution of *Y* among completers is the same as that for drop-outs.

For any *G _{O}* and each

(2)

The restriction that (1) places on the laws of the complete data is identically equivalent to the following logistic regression restriction:

(3)

where *η* is an unknown scalar parameter. Thus, there exists a one-to-one *q*-dependent mapping between *G _{O}* and

In the frequentist paradigm, there is thought to be one true function *q* which generated the complete data. However, the observed data contain no evidence about this function *q*. Within the context of this paradigm, Rotnitzky and colleagues recommended repeating the analysis over a range of *q* judged plausible by field experts. The following section discusses the two ways in which *q* can be interpreted. This may help in the elicitation of reasonable ranges from the experts. Finally, it is important to note that *q* would be partially or wholly identified if additional modeling assumptions were imposed. For example, suppose that it is assumed that the marginal distribution of *Y* was symmetric. Then, when *q* is a constant function, *f*_{1} must be symmetric. If the empirical distribution of *Y* among completers is skewed, then we have evidence *q* is non-constant. In Section 6, we show that assuming log(*Y*) follows a normal distribution is sufficient to identify the magnitude of selection bias. Our belief is that it is rare that one would have such strong *a priori* knowledge about the distribution of *Y* that it should be used to identify *q*.

From (1), we see that *q* indicates how the distribution of *Y* among completers relates to the distribution of *Y* among drop-outs. Equation (3) tells us that *q* quantifies the influence of the outcome *Y* on the odds that subjects drop out. From this latter interpretation, we refer to *q* as a selection bias function.

Positing that *q* = 0 is equivalent to missing at random, which in this setting is the same as the missing completely at random assumption (Rubin, 1976). This equivalence follows since the selection model (3) does not then depend on the observed data. Using the pattern mixture representation (1), *q* = 0 says that the distribution of *Y* among drop-outs is the same as that of completers. From the selection bias representation, *q* = 0 says that *Y* has no influence on a subject's completion probability. When *q* is non-constant, the outcome *Y* is said to be missing not at random.

There are obviously an infinite number of choices for *q*. In conducting a sensitivity analysis, it is useful to restrict attention to a simple class of selection bias functions, which include missing at random. In addition, subject-matter experts need a clear and meaningful parametrization of the selection bias function in order to encode their beliefs. The dimension of the parametrization needs to be relatively low, because otherwise researchers may be hard pressed to encode their beliefs in high dimensions. While some analysts may feel uncomfortable focusing on a particular parametrization, it is important to recognize that the aim is not to find the ‘truth’ (since it is unknowable without random follow-up sampling), but to report an analysis which reasonably reflects an expert's beliefs about selection bias.

In our data analysis, we consider the class of functions , indexed by a selection bias parameter *α* (note that we could have considered alternative parametrizations to reflect additional nonlinearities considered plausible by experts, e.g. piecewise linear). In , *α* = 0 is equivalent to missing at random and *α* ≠ 0 is equivalent to missing not at random. For given *α*, we denote the mapping from the observed data distribution to the complete data distribution by *Φ _{α}*. From (3), the parameter

To estimate *F*, under restriction (1, 3), we use non-parametric maximum likelihood. We estimate *F*(*y*) by *F _{n}*(

It is straightforward to prove that converges to a mean zero normal distribution with influence curve

The asymptotic variance of *μ*(*F _{n}*) is then equal to and it can be estimated by , where

To proceed with a Bayesian analysis, an expert must specify his/her prior beliefs about the model parameters. In light of the pattern-mixture model (1) and its selection model analog (3) where *q*(*Y*) = *α* log(*Y*), we can parametrize the model by either (a) (*α, p, F*_{1}) or (b) (*α, η, F*). Our goal is to estimate the posterior distribution of *μ* = *μ*(*F*). We denote this posterior distribution by *π*(*μ*|* O*). We know that the posterior for

(1)

Furthermore, under parametrization (a), *π*(*α*|* O*) can be written as

(2)

and under parametrization (b), *π*(*α*|* O*) can be written as

From (5), note that the posterior distribution for *α* will be exactly equal to the prior distribution on *α* (i.e. *π*(*α*|* O*) =

To complete the model, we need to specify prior distributions for *α, η*, and *F*. We specify independent priors for each:

In particular, we will assume an informative prior on *α*, a relatively non-informative prior on *η*, and a Dirichlet process prior on *F*, with known degrees of freedom, *df* (precision), and known base measure, *H*(* θ*), with unknown parameter vector,

To elicit a prior on *α*, one can ask an expert about the odds of drop-out for a proportional change in response. If the expert is comfortable with a parametric form for the prior, a most likely value might be specified along with several quantiles to uniquely determine the prior distribution. An alternative approach would be to use a non-parametric prior for which the expert would provide a best guess and then attach weights to intervals around that guess to form a histogram. The histogram might then be smoothed to facilitate inference. For a good discussion of elicitation of priors, see Chaloner (1996).

As a word of caution, we note that when informative priors on both *α* and *F* are chosen, we may observe situations where the priors, taken together, are not ‘compatible’ with the observed data. For example, suppose in our analysis of the ACTG 175 data, we assumed a left-skewed parametric family for *F* and assumed that *α* has most of its probability mass between –2.0 and –1.0. Given the observed data (*α* = 0 in Figure 1), a sampling algorithm would like to impute large values of *Y* for the missing data based on the distributional form on *F*, while from the assumptions about *α*, the algorithm would like to impute small values of *Y* for the missing data. This can result in bi-modal posterior distributions for (*α, η, μ*(*F*)). We believe such a scenario would be viewed as a problem and unsatisfactory for inferences. To avoid the bi-modality, we suggest that the experts re-evaluate their priors by reducing the ‘informativeness’ of one of them.

To sample from the posterior distribution of (*α, η, F, θ*), we will use a Gibbs sampling algorithm with Metropolis–Hastings steps (Smith and Roberts, 1993). We will also include a data augmentation step (Tanner and Wong, 1987) which will greatly simplify the algorithm by making it a complete data problem upon inclusion of the augmented data. Each of the

- sample from
*π*(*F,*|**θ***α, η,*,**Y**_{miss}**O**) - sample from
*π*(*α, η*|*F,*,**θ, Y**_{miss}**O**) - sample from
*π*(|**Y**_{miss}*α, h, F,*,**θ****O**)

where * Y_{miss}* is vector of outcomes for drop-outs. We now provide details on each step.

To sample from *π*(*F, θ*|

We propose to sample from the distributions of each component of * θ* conditional on (

- 1aFix J to be a large integer
- 1bDraw
*B*_{1}, . . . ,*B*_{J}i.i.d. from a*Beta*(1,*df*+*n*) - 1cDraw
*V*_{1}, . . . ,*V*_{J}i.i.d. using the following scheme. For each j, draw*U*~ Uniform(0, 1)._{j}- 1c.1
- If
*U*<_{j}*df*/(*df*+*n*), draw*V*from_{j}*H*().**θ** - 1c.2
- Otherwise, draw
*V*from the empirical cdf of_{j}*Y*(i.e. mass 1/*n*at each*Y*)_{i}

- 1dForm , where and
*δ*is the probability measure giving unit mass at the point_{a}*a*. This will be an approximate sample from the full conditional.

To sample from the full conditional distribution of (*α, η*) in Step 2, we will use a Metropolis–Hastings algorithm with candidate distribution a normal- or *t*-approximation to the full conditional distribution. With the augmented data, this is equivalent to sampling the coefficients in a Bayesian logistic regression model.

To sample from *π*(* Y_{miss}*|

- 3aDraw an observation,
*Y*, from_{cand}*F*,using an inverse cdf approach. - 3bDraw
*U*~ Uniform(0, 1). - 3cIf
*U*< exp(*η*+*α*log(*Y*))/(1 + exp(_{cand}*η*+*α*log(*Y*))), then set_{cand}*Y*=_{miss}*Y*. Otherwise, go to step 3a._{cand}

By ‘imputing’ the missing data in this way, we can work with the simpler complete data problem.

With *df* << *n* and *n* large, a semiparametric version of the Bernstein–von Mises theorem (van der Vaart, 2000) suggests that the posterior of *μ*(*F*) given *α* will be well approximated by a normal distribution with mean *μ*(*F _{n}; α*) and variance , where

When *df* >> *n, n* is large, and we specify relatively non-informative priors for *α, η, θ*, the model is approximately equivalent to a frequentist model in which (3) holds with

When *df* is small and the sample size *n* is large, we saw in the previous section that given *α* and the data * O*, the distribution of

As a more generic check, we suggest the multiple chains approach of Gelman and Rubin (1992). This approach will also help diagnose bi-modality of the posterior distribution.

The fit of the model to the observed data is the only way to assess the adequacy of the model. This is especially important when *df* is taken to be large. A simple way to check the model here is to see how well the posterior distribution of *F*_{1} matches up with the empirical cdf of the observed data. We can use the posterior predictive checking approach of Gelman, Meng, and Stern (1996). That is, we sample a new set of observed *Y* from their posterior predictive distribution, using the same approach that we used to sample * Y_{miss}*. For each sample, we compute an empirical cdf based on the new set of observed

One approach that has been proposed in the statistical literature is to specify parametric models for both the conditional distributional of *R* given *Y* and the marginal distribution of *Y* (Diggle and Kenward, 1994). For ACTG 175, it might be assumed that *Y* is log-normal with parameter vector * θ* = (

In Figure 2, we present the treatment-specific posterior distributions of *α* and *μ* (dashed lines), based on *K* = 10 000 iterations (first 1000 iterations discarded; overall acceptance rate in Step 2 of the algorithm was 72.6% and 74.8% in the AZT+ddI and ddI arms, respectively). The approximate maximum likelihood estimates (standard errors; 95% credible intervals) for *α* are –2.58 (0.24; [–3.00, –2.09]) and –2.76 (0.26; [–3.25, –2.22]) for the AZT+ddI and ddI arms, respectively. Under the log-normality assumption, we would reject the treatment-specific null hypothesis of missing at random. The corresponding estimates for *μ* are 303.42 (13.63;[277.67,331.20]) and 296.68 (13.51;[271.49,324.17]). The posterior distribution of the difference between the means in the AZT+ddI and ddI arms is displayed in Figure 3. The approximate maximum likelihood estimate of the difference is 6.74 (19.27; [–30.89, 43.92]), suggesting no significant treatment effect. To check the model fit, we use the posterior predictive checking approach described in Section 5.5. The first row of Figure 4 displays the empirical distribution of the observed CD4 counts at week 56 (solid line) along with 100 empirical distribution functions of observed outcomes drawn from its posterior predictive distribution. As the the figure illustrates, the model does not fit the observed data well. To achieve a better fit, we tried two alternative models for the marginal distribution of *Y*. In particular, we let the cube root and square root of *Y* be normally distributed. These models fit slightly better than the log-normal, but still did not fit the observed data well. Under these alternative models, the approximate maximum likelihood estimates and confidence intervals for *α* were of similar order of magnitude as the log-normal model.

Treatment-specific posterior distributions for *α* and *μ*. Dashed lines (Model with *df* = 10,000, *π*(*α*) ~ *N*(0, 10), *π*(*η*) ~ *N*(0, 100), *π*(*η*) ~ *N*(5.5, 1.0) and *π*(1/*τ*^{2}) ~ *G*(1, 10)); **...**

Posterior distribution for *μ*(AZT + ddI) – *μ*(*ddI*). Dashed lines (Model with *df* = 10, 000, *π*(*α*) ~ *N*(0, 10), *π*(*η*) ~ *N*(0, 100), *π*(*γ*) ~ *N*(5.5, 1.0) and *π*(1/*τ*^{2} ~ *G*(1, 10)); **...**

When the distribution of the CD4 count at week 56 is left unspecified, we saw in Section 4 that *α* is not identified. In the absence of such distributional information, the best that can be achieved from a frequentist perspective is to perform a sensitivity analysis. In Figure 5, we present the treatment-specific estimated overall mean CD4 count at week 56 (with 95% confidence intervals) as a smooth function of the selection bias parameter *α* (*α* ranges from –2.0 to 2.0). Note how the sampling variability (as indicated by the width of the confidence intervals) is dominated by the uncertainty in *α*. That is, the width of any given confidence interval is approximately 40–50 CD4 cell counts, while the range of estimated means across *α* is approximately 200–300 CD4 cell counts. Common statistical practice is to simply report one of these confidence intervals which can grossly misrepresent treatment efficacy.

Treatment-specific estimated mean CD4 count at week 56 (with 95% confidence intervals) as a function of the selection bias parameter *α*.

In Figure 6, we present a contour plot of the *Z*-statistic (estimated difference in means divided by standard error of the difference) associated with the test of the null hypothesis of no treatment difference as a function of treatment-specific selection bias parameters. On the horizontal (vertical) axis, we vary the selection bias parameter for the AZT+ddI (ddI) arm. Regions marked with a treatment label indicate that, for selection bias parameter combinations in the region, a 0.05 level test of the null hypothesis would be rejected in favor of that treatment. The solid point in the contour plot indicates the result from the missing at random assumption in both treatment groups. The conclusion from this contour plot is that with mild levels of differential selection bias in the treatment arms (e.g. *α* = 0 in the ddI arm and *α* = –0.025 in the AZT+ddI arm), we would change the conclusion based on the default analysis. As a result, the evidence in favor of AZT+ddI appears to be ‘weaker’ than that based on missing at random. As we see, this sensitivity analysis may be viewed as limited in its use since it does not provide an overall quantification of the strength of evidence, accounting for uncertainty in beliefs about selection bias.

When a decision is required, the flexible Bayesian methodology described in Section 5 can be used to summarize the treatment efficacy in the presence of the uncertainly regarding the distribution of the outcome and the level of selection bias.

For each treatment group, we assumed that the distribution of *Y, F*, followed a Dirichlet process mixture prior with precision *df* = 1, and log-normal base measure with parameter vector *θ* = (*γ, τ*^{2}). We assumed that the hyper-priors for *γ* and *τ*^{2} were independent. In particular, we let *π*(*γ*) ~ *N*(5.5, 1), *π*(1/*/τ*^{2}) ~ Γ (1, 10). In addition, we assumed an independent, non-informative, normal prior on *η*, i.e. *π*(*η*) ~ *N*(0, 100). For *α*, we use *π*(*α*) ~ *N*(–0.5, 0.25^{2}). That is, the prior belief is that subjects with lower CD4 counts (under full compliance) at week 56 are more likely to drop out. Specifically, the prior states that there is a 95% chance that the odds ratio of drop-out for subjects with a two-fold change in CD4 cells at week 56 is between 1 and 2, with a most probable value around 1.4.

Figure 2 displays the treatment-specific posterior distributions of *α* and *μ* (solid lines), based on *K* = 10 000 iterations (first 1000 iterations discarded; acceptance rate in Step 2 of the algorithm was 80.0% and 80.3% in the AZT+ddI and ddI arms, respectively). The prior distribution of *α* is the dotted line and demonstrates the difference between the prior and posteriors. The posterior means (95% credible intervals) for *α* are –0.50 ([–0.90, –0.14]) and –0.49 ([–0.83, –0.15]) for the AZT+ddI and ddI arms, respectively. For comparison, the prior mean (95% credible interval) for *α* was –0.50 ([–1.0, 0.0]) for both treatment arms. The treatment-specific posterior distributions of *α* are tighter than the prior distributions, indicating a weak level of induced *a priori* dependence between *α* and (*p, F*_{1}). The posterior means (95% credible intervals) for *μ* are 368.24 ([342.43, 390.57]) and 348.00 ([330.40, 365.40]) for the AZT+ddI and ddI arms, respectively. Figure 3 displays the posterior distribution of the difference between *μ*(*AZT* + *ddI*) and *μ*(*DDI*) (solid line). The posterior mean (95% credible interval) is 20.24 ([–11.12, 48.63]). The posterior probability that *μ*(*AZT* + *ddI*) is greater than *μ*(*DDI*) is 90.76%. After accounting for prior beliefs regarding selection bias, there appears to be relatively strong evidence in favor of combination therapy.

In Figure 7, we display the treatment-specific convergence diagnostic described in Section 5.4 for *α* = –0.8, –0.65, –0.5, –0.35, –0.2. The solid line is the estimated distribution of *π*(*μ*|**O**, *α*) based on our non-parametric maximum likelihood results of Section 4 and the dotted line is the estimated distribution based on the Gibbs sampling scheme. The two estimates are quite close, providing support for convergence. The second row of Figure 4 displays the empirical distribution of the observed CD4 counts at week 56 (solid line) along with 100 empirical distribution functions of observed outcomes drawn from its posterior predictive distribution. As the the figure illustrates, the model, as expected, fits the observed data well.

In evaluating treatment effects in the presence of missing data, the analyst usually starts with the default missing at random analysis. Under missing at random, the estimated means are 385 and 360 in the AZT+ddI and ddI arms, respectively. The difference in means is 25 CD4 cells and the null hypothesis of no treatment difference is rejected.

Recognizing that missing at random is likely to fail, the analyst might consider a parametric selection model analysis, similar to the one conducted in Section 6.1. There are two important lessons to be learned from this analysis. The first lesson is that model selection is difficult and model checking is critical. In our analysis, we found that some of the models typically used to fit CD4 count data did not fit the observed data well. To achieve better fits, one needs to either use a more flexible model for the distribution of *Y* or fit amore flexible form for the selection bias function *q*(*Y*). The second lesson is that the distributional form of *Y* can determine the magnitude of selection bias and one must make sure that the level of selection bias is substantively plausible. Using the log-normal assumption, the estimated value of *α* is –2.58 and –2.76 in the AZT+ddI and ddI arms, respectively. Missing at random is rejected in both treatment arms. These levels of selection bias are enormous; the imputed distribution of *Y* among drop-outs is more extreme than the left-most histograms in Figure 1. They correspond to the belief that for two subjects who have a two-fold difference in CD4 cells at week 56, the sicker subject is 6.0 and 6.7 times as likely to drop out in the AZT+ddI and ddI arms, respectively. These levels of selection bias are highly implausible. With these levels, the estimated means are 303 and 297 for the AZT+ddI and ddI arms. This is a huge reduction from the estimated means under missing at random. The difference in means is 6.7 CD4 cells and is not statistically significant. Similar results were observed in the alternative models we considered. While one can argue that these levels of selection bias are due to ill-fitting models, we conjecture that one can posit a more flexible model for the marginal distribution of *Y* which fits the data well but nevertheless still identifies a highly implausible level of selection bias.

The frequentist non-parametric analysis in Section 6.2 suggested that evidence in favor of AZT+ddI is weaker than that provided by the missing at random analysis. However, the drawback of the sensitivity analysis is that a single answer is not provided and the level of uncertainty is not quantified. Using treatment-specific informative priors on *α*, the flexible Bayesian analysis of Section 6.3 provides a quantification of uncertainty through posterior distributions. In this analysis, the treatment-specific distributions for *α* are slightly narrower than the prior specifications, indicating that, within the context of our fully Bayesian model, the data provide relatively little information about *α*. The estimated means are 368 and 348 in the AZT+ddI and ddI arms, which are, as expected, lower than the missing at random means, and more plausible than those from the fully parametric analysis. The posterior distributions (solid lines in Figure 3) indicate the degree of uncertainty regarding the treatment-specific means. The degree of uncertainty is comparable to that provided by parametric analysis and much larger than that of the missing at random analysis. The estimated mean difference is 20 CD4 cells and the posterior distribution for the difference (solid line in Figure 4) indicates the level of uncertainty as reflected by the span of the 95% credible interval and the 91% chance that AZT+ddI is superior to ddI. This is a concise representation of the strength of evidence regarding the treatment effect. As this analysis incorporates prior beliefs about selection bias and fits the observed data well, it may be more plausible than the missing at random analysis.

In our view, the fully parametric approach above should only be used when there is strong scientific evidence to support the use of particular parametric models for the distribution of the outcome *and* the selection mechanism. When there are only strong prior beliefs about the distribution of the outcome, the approach of Lee and Berger (2001) or our approach with large weight given to the base measure of the prior distribution for the outcome and a high-dimensional parametrization of the selection bias function with vague priors can be used. In situations where informative prior beliefs about the selection bias can be quantified, our flexible Bayesian approach is an attractive way of summarizing the treatment effect and its associated uncertainty. If a flexible Bayesian analysis is implemented, we feel that it should be conducted in conjunction with the sensitivity analysis, as this latter analysis provides informal and formal checks for the former. Finally, if experts cannot agree on the distributional form or on the nature of selection bias, then the only objective analysis is to present worst-case bounds.

In the ACTG 175 data, missingness of the outcome occurred for multiple reasons (i.e. non-compliance, skipped clinic visits, and loss to follow-up). In our selection model, we did not distinguish between these various causes of missingness, which could very well have different relationships with the outcome of interest. It is not difficult to extend our approach to this setting. Specifically, one could re-define *R* to have unique values corresponding to each major type of missingness and a unique value for completion. Then, one would fit a polytomous logistic regression with type-specific intercepts and selection bias functions. To allow for differential relationships, these functions would have to be parameterized separately in order to elicit priors.

In most randomized trials, the data structure is much more complicated than the one handled in this paper. Specifically, baseline information is usually collected and the primary outcomes may be failure times or repeated measures. The sensitivity analysis ideas have been extended to deal with these more realistic settings. In future work, we will focus on the extensions of our flexible Bayesian approach. The task will be considerably more difficult, as more information implies more modeling and greater prior specifications. For example, if we were to consider high-dimensional baseline prognostic factors *X* as part of our observed data, then the analog of our model (3) would be, say,

where *h*(*X*) is some unknown function of *X* and *q*(*x, y*) is a specified function of *x* and *y*. We would then need to specify an informative prior for *q*, and flexible priors for *h*(*X*) and the conditional distribution of *Y* given *X*. Flexible priors are important here to prevent identification of *q* in ways that are unintended. In addressing even this simple extension, substantial dimension reduction will be required and the computational complexity of the sampling algorithm will increase substantially.

Future work will also focus on developing (1) techniques for elicitation of prior beliefs for the selection bias parameter, (2) faster sampling algorithms, and (3) formal proofs of the large sample properties of our flexible Bayesian procedure.

This research was partially supported by National Institute of Health grants CA85295, MH56639, HD38209, A132475. The authors would like to thank the associate editor and two reviewers for their helpful comments, which have greatly improved the quality of the manuscript.

DANIEL O. SCHARFSTEIN, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA ; Email: dscharf/at/jhsph.edu.

MICHAEL J. DANIELS, Department of Statistics, University of Florida, Gainesville, FL 32611, USA.

JAMES M. ROBINS, Department of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.

- Antoniak C. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics. 1974;2:1152–1174.
- Bayarri M, Berger J. Robust Bayesian analysis of selection models. The Annals of Statistics. 1998;26:645–659.
- Bayarri M, Degroot M. Bayesian analysis of selection models. The Statistician. 1987;36:137–146.
- Chaloner K. Elicitation of prior distributions. In: Berry D, Stangl D, editors. Bayesian Biostatistics. Dekker; New York: 1996. pp. 141–156.
- Daniels M, Hogan J. Reparameterizing the pattern-mixture model for sensitivity analyses under informative drop-out. Biometrics. 2000;56:1241–1248. [PubMed]
- Degrutolla V, Tu X. Modelling progression of cd4-lymphocyte count and its relationship to survival. Biometrics. 1994;50:1003–1014. [PubMed]
- Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis (disc: P73-93). Applied Statistics. 1994;43:49–73.
- Doss H. Bayesian nonparametric estimation for incomplete data via successive substitution sampling. The Annals of Statistics. 1994;22:1763–1786.
- Forster JJ, Smith PWF. Model-based inference for categorical survey data subject to non-ignorable non-response. Journal of the Royal Statistical Society, Series B. 1998;60:57–70.
- Gelman A, Meng X-L, Stern H. Posterior predictive assessment of model fitness via realized discrepancies (disc: P760-807). Statistica Sinica. 1996;6:733–760.
- Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (disc: P483-501, 503-511). Statistical Science. 1992;7:457–472.
- Hammer SM, Katzenstein DA, Hughes MD, Gun dacker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine. 1996;335:1081–1090. [PubMed]
- Lee J, Berger J. Semiparametric Bayesain analysis of selection models. Journal of the American Statistical Association. 2001;96:1397–1409.
- Little R. A class of pattern-mixture models for normal missing data. Biometrika. 1994;81:471–483.
- Neal RM. Sampling from multimodal distributions using tempered transitions. Statistics and Computing. 1996;6:353–366.
- Raab GM, Donnelly CA. Information on sexual behavior when some data are missing. Applied Statistics. 1999;48:117–133.
- Robins J. The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. In: Sechrest L, Freeman H, Mulley A, editors. Health Service Research Methodology: A Focus on AIDS. U.S. Public Health Service, National Center for Health Services Research; Washington, D.C.: 1989. pp. 113–159.
- Robins JM, Rotnitzky A, Scharfstein DO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran EM, Berry D, editors. Statistical Models in Epidemiology, the Environment, and Clinical Trials. Springer; New York: 2000. pp. 1–94.
- Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association. 1998;93:1321–1339.
- Rotnitzky A, Scharfstein D, Su T, Robins J. Methods for conducting sensitivity analysis of trials with potentially non-ignorable competing causes of censoring. Biometrics. 2001;57:103–113. [PubMed]
- Rubin D. Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association. 1977;72:538–543.
- Rubin DB. Inference and missing data. Biometrika. 1976;72:581–590.
- Scharfstein D, Robins J, Eddings W, Rotnitzky A. Inference in randomized studies with informative censoring and discrete time-to-event outcomes. Biometrics. 2001;57:404–413. [PubMed]
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). Journal of the American Statistical Association. 1999;94:1096–1146.
- Smith AFM, Roberts GO. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (disc: P53-102). Journal of the Royal Statistical Society, Series B, Methodological. 1993;55:3–23.
- Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association. 1987;82:528–540.
- van der Vaart A. Asymptotic Statistics. Cambridge University Press; Cambridge: 2000.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |