In this paper, we have developed both frequentist and Bayesian methods for analyzing missing binary outcomes that are thought to be informatively (i.e. related to the unobserved outcome) missing. Our relative risk parametrization of selection bias, incorporation for discrete covariates, and prior dependence of the selection bias across treatment groups extends earlier work (Scharfstein and others, 2003
), which used the log odds ratio parametrization without covariates. We elicited informative age group and vaccination status prior distributions from an influenza expert. Bayesian inference with informative priors on the selection bias parameters provides a useful and parsimonious way of drawing inference about VE that incorporates expert uncertainty about the missing data mechanism. In addition, the Bayesian approach provides better small-sample inference.
The frequentist or Bayesian sensitivity analysis approach provides much greater detail than the single summary from the fully Bayesian analysis. As a consequence, when the dimension of the selection parameters is greater than 3 or 4, it is harder to visualize. While the advantage of the Bayesian sensitivity analysis is the finite sample performance, it is computationally very intensive. The frequentist sensitivity analysis is computationally more feasible and will perform well when the sample sizes are large.
We used our proposed methods to re-analyze an influenza vaccine field study. We did a formal sensitivity analysis to evaluate the effects of preferential selection of children with non-specific illness for obtaining surveillance cultures to confirm true influenza. Our analysis showed that under plausible ranges of selection bias, the VE estimates, though lower than when assuming MAR, are substantially higher than those based on the non-specific influenza-like illness definition alone. Our methods will be generally useful in future vaccine field studies, or other similar studies, in which confirmatory biological specimens are not MAR.
For our development in this paper, we made the assumption that any person who does not present with non-specific influenza-like illness also does not have medically attended influenza illness. Implicitly, we assumed a degenerate prior at zero for the Pz,x [Y = 1|A = 0]. In many vaccine field studies, cases are ascertained on symptoms, then possibly confirmed biologically. Those participants without symptoms are generally not confirmed biologically. In the study analyzed here, ascertainment was passive through clinic visits, so our efficacy measure was for medically attended influenza illness. Ascertainment on symptoms could also be active, say, through regular phone calls to the home. Our methods could be used for studies with an active ascertainment method without further extension, whereby A(z) = 1 would denote being symptomatic by whatever case definition was used for that study, and the interpretation of VE would be for symptomatic influenza.
It is straightforward to extend our method to the situation that infection is confirmed, perhaps serologically, in a sample of people who did not have non-specific illness, A(z) = 0. In this case, we would not need Assumption 2. The scientific question then would be to estimate efficacy against infection, not medically attended disease, as in this paper. There would be additional selection bias parameters in that situation. However, if a study were well enough planned to sample asymptomatic participants, it would be hoped that the sampling would be planned to be random, so that selection bias would not play a role.
Our informative priors for the selection bias parameters were assumed to be multivariate Normal on the log β scale. Prior distributions could also be constructed less parametrically using the information elicited from the experts. When using our methods to analyze future studies, the providers making the decision from whom to obtain biological specimens could be directly asked their prior beliefs about selection bias.
In ongoing research, we are extending the Bayesian methods to incorporate higher-dimensional covariates. We are also working on methods for longitudinal and time-to-event outcomes.