Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory.
Drop-out; Marginal structural model; Missing at random
We derive estimators of the mean of a function of a quality-of-life adjusted failure time, in the presence of competing right censoring mechanisms. Our approach allows for the possibility that some or all of the competing censoring mechanisms are associated with the endpoint, even after adjustment for recorded prognostic factors, with the degree of residual association possibly different for distinct censoring processes. Our methods generalize from a single to many censoring processes and from ignorable to non-ignorable censoring processes.
Cause-specific; Dependent censoring; Inverse weighted probability; Sensitivity analysis
We consider nonparametric regression of a scalar outcome on a covariate when the outcome is missing at random (MAR) given the covariate and other observed auxiliary variables. We propose a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. We show that AIPW kernel estimators are consistent when the probability that the outcome is observed, that is, the selection probability, is either known by design or estimated under a correctly specified model. In addition, we show that a specific AIPW kernel estimator in our class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust, that is, it remains consistent if this model is correctly specified even if the selection probabilities are modeled or specified incorrectly. Furthermore, when both models happen to be right, this double-robust estimator attains the smallest possible asymptotic variance of all AIPW kernel estimators and maximally extracts the information in the auxiliary variables. We also describe a simple correction to the AIPW kernel estimating equations that while preserving double-robustness it ensures efficiency improvement over nonaugmented IPW estimation when the selection model is correctly specified regardless of the validity of the second model used in the augmentation term. We perform simulations to evaluate the finite sample performance of the proposed estimators, and apply the methods to the analysis of the AIDS Costs and Services Utilization Survey data. Technical proofs are available online.
Asymptotics; Augmented kernel estimating equations; Double robustness; Efficiency; Inverse probability weighted kernel estimating equations; Kernel smoothing
We consider the estimation of the parameters indexing a parametric model for the conditional distribution of a diagnostic marker given covariates and disease status. Such models are useful for the evaluation of whether and to what extent a marker’s ability to accurately detect or discard disease depends on patient characteristics. A frequent problem that complicates the estimation of the model parameters is that estimation must be conducted from observational studies. Often, in such studies not all patients undergo the gold standard assessment of disease. Furthermore, the decision as to whether a patient undergoes verification is not controlled by study design. In such scenarios, maximum likelihood estimators based on subjects with observed disease status are generally biased. In this paper, we propose estimators for the model parameters that adjust for selection to verification that may depend on measured patient characteristics and additonally adjust for an assumed degree of residual association. Such estimators may be used as part of a sensitivity analysis for plausible degrees of residual association. We describe a doubly robust estimator that has the attractive feature of being consistent if either a model for the probability of selection to verification or a model for the probability of disease among the verified subjects (but not necessarily both) is correct.
Missing at Random; Nonignorable; Missing Covariate; Sensitivity Analysis; Semiparametric; Diagnosis
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
We present new statistical analyses of data arising from a clinical trial designed to compare two-stage dynamic treatment regimes (DTRs) for advanced prostate cancer. The trial protocol mandated that patients were to be initially randomized among four chemotherapies, and that those who responded poorly were to be rerandomized to one of the remaining candidate therapies. The primary aim was to compare the DTRs’ overall success rates, with success defined by the occurrence of successful responses in each of two consecutive courses of the patient’s therapy. Of the one hundred and fifty study participants, forty seven did not complete their therapy per the algorithm. However, thirty five of them did so for reasons that precluded further chemotherapy; i.e. toxicity and/or progressive disease. Consequently, rather than comparing the overall success rates of the DTRs in the unrealistic event that these patients had remained on their assigned chemotherapies, we conducted an analysis that compared viable switch rules defined by the per-protocol rules but with the additional provision that patients who developed toxicity or progressive disease switch to a non-prespecified therapeutic or palliative strategy. This modification involved consideration of bivariate per-course outcomes encoding both efficacy and toxicity. We used numerical scores elicited from the trial’s Principal Investigator to quantify the clinical desirability of each bivariate per-course outcome, and defined one endpoint as their average over all courses of treatment. Two other simpler sets of scores as well as log survival time also were used as endpoints. Estimation of each DTR-specific mean score was conducted using inverse probability weighted methods that assumed that missingness in the twelve remaining drop-outs was informative but explainable in that it only depended on past recorded data. We conducted additional worst-best case analyses to evaluate sensitivity of our findings to extreme departures from the explainable drop-out assumption.
Causal inference; Efficiency; Informative dropout; Inverse probability weighting; Marginal structural models; Optimal regime; Simultaneous confidence intervals
Modern epidemiologic studies often aim to evaluate the causal effect of a point exposure on the risk of a disease from cohort or case-control observational data. Because confounding bias is of serious concern in such non-experimental studies, investigators routinely adjust for a large number of potential confounders in a logistic regression analysis of the effect of exposure on disease outcome. Unfortunately, when confounders are not correctly modeled, standard logistic regression is likely biased in its estimate of the effect of exposure, potentially leading to erroneous conclusions. We partially resolve this serious limitation of standard logistic regression analysis with a new iterative approach that we call ProRetroSpective estimation, which carefully combines standard logistic regression with a logistic regression analysis in which exposure is the dependent variable and the outcome and confounders are the independent variables. As a result, we obtain a correct estimate of the exposure-outcome odds ratio, if either the standard logistic regression of the outcome given exposure and confounding factors is correct, or the regression model of exposure given the outcome and confounding factors is correct but not necessarily both, that is, it is double-robust. In fact, it also has certain advantadgeous efficiency properties. The approach is general in that it applies to both cohort and case-control studies whether the design of the study is matched or unmatched on a subset of covariates. Finally, an application illustrates the methods using data from the National Cancer Institute's Black/White Cancer Survival Study.
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained.
Causal inference; Propensity score; Standardized mean
In this companion article to “Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes, Part I: Main Content” [Orellana, Rotnitzky and Robins (2010), IJB, Vol. 6, Iss. 2, Art. 7] we present (i) proofs of the claims in that paper, (ii) a proposal for the computation of a confidence set for the optimal index when this lies in a finite set, and (iii) an example to aid the interpretation of the positivity assumption.
dynamic treatment regime; double-robust; inverse probability weighted; marginal structural model; optimal treatment regime; causality
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).
Doubly robust; Generalized odds ratio; Locally efficient; Semiparametric logistic regression
We consider estimation, from a double-blind randomized trial, of treatment effect within levels of base-line covariates on an outcome that is measured after a post-treatment event E has occurred in the subpopulation 𝒫E,E that would experience event E regardless of treatment. Specifically, we consider estimation of the parameters γ indexing models for the outcome mean conditional on treatment and base-line covariates in the subpopulation 𝒫E,E. Such parameters are not identified from randomized trial data but become identified if additionally it is assumed that the subpopulation 𝒫Ē,E of subjects that would experience event E under the second treatment but not under the first is empty and a parametric model for the conditional probability that a subject experiences event E if assigned to the first treatment given that the subject would experience the event if assigned to the second treatment, his or her outcome under the second treatment and his or her pretreatment covariates. We develop a class of estimating equations whose solutions comprise, up to asymptotic equivalence, all consistent and asymptotically normal estimators of γ under these two assumptions. In addition, we derive a locally semiparametric efficient estimator of γ. We apply our methods to estimate the effect on mean viral load of vaccine versus placebo after infection with human immunodeficiency virus (the event E) in a placebo-controlled randomized acquired immune deficiency syndrome vaccine trial.
Counterfactuals; Missing data; Potential outcomes; Principal stratification; Structural model; Vaccine trials