Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
Causal inference; Enhanced propensity score model; Missing at random; No unmeasured confounders; Outcome regression
A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial.
Coarsening at random; Discrete hazard; Dropout; Longitudinal data; Missing at random
Doubly robust estimation combines a form of outcome regression with a model for the exposure (i.e., the propensity score) to estimate the causal effect of an exposure on an outcome. When used individually to estimate a causal effect, both outcome regression and propensity score methods are unbiased only if the statistical model is correctly specified. The doubly robust estimator combines these 2 approaches such that only 1 of the 2 models need be correctly specified to obtain an unbiased effect estimator. In this introduction to doubly robust estimators, the authors present a conceptual overview of doubly robust estimation, a simple worked example, results from a simulation study examining performance of estimated and bootstrapped standard errors, and a discussion of the potential advantages and limitations of this method. The supplementary material for this paper, which is posted on the Journal's Web site (http://aje.oupjournals.org/), includes a demonstration of the doubly robust property (Web Appendix 1) and a description of a SAS macro (SAS Institute, Inc., Cary, North Carolina) for doubly robust estimation, available for download at http://www.unc.edu/∼mfunk/dr/.
causal inference; epidemiologic methods; propensity score
Methods for estimating average treatment effects, under the assumption of no unmeasured confounders, include regression models; propensity score adjustments using stratification, weighting, or matching; and doubly robust estimators (a combination of both). Researchers continue to debate about the best estimator for outcomes such as health care cost data, as they are usually characterized by an asymmetric distribution and heterogeneous treatment effects,. Challenges in finding the right specifications for regression models are well documented in the literature. Propensity score estimators are proposed as alternatives to overcoming these challenges. Using simulations, we find that in moderate size samples (n= 5000), balancing on propensity scores that are estimated from saturated specifications can balance the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates. Therefore, unlike regression model, even if a formal model for outcomes is not required, propensity score estimators can be inefficient at best and biased at worst for health care cost data. Our simulation study, designed to take a ‘proof by contradiction’ approach, proves that no one estimator can be considered the best under all data generating processes for outcomes such as costs. The inverse-propensity weighted estimator is most likely to be unbiased under alternate data generating processes but is prone to bias under misspecification of the propensity score model and is inefficient compared to an unbiased regression estimator. Our results show that there are no ‘magic bullets’ when it comes to estimating treatment effects in health care costs. Care should be taken before naively applying any one estimator to estimate average treatment effects in these data. We illustrate the performance of alternative methods in a cost dataset on breast cancer treatment.
Propensity score; non-linear regression; average treatment effect; health care costs
The inverse of the nonparametric information operator is key to finding doubly robust estimators and the semiparametric efficient estimator in missing data problems. It is known that no closed-form expression for the inverse of the nonparametric information operator exists when missing data form nonmonotone patterns. Neumann series is usually applied to approximate the inverse. However, Neumann series approximation is only known to converge in L2 norm, which is not sufficient for establishing statistical properties of the estimators yielded from the approximation. In this article, we show that L∞ convergence of the Neumann series approximations to the inverse of the non-parametric information operator and to the efficient scores in missing data problems can be obtained under very simple conditions. This paves the way to the study of the asymptotic properties of the doubly robust estimators and the locally semiparametric efficient estimator in those difficult situations.
Auxiliary information; Induction; Rate of convergence; Weighted estimating equation
Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.
Doubly robust; Missing at random; Multiple imputation; Nearest neighbor; Nonparametric imputation; Sensitivity analysis
The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile–quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.
balance; goodness-of-fit; observational study; propensity score; matching; propensity-score matching; standardized difference; bias
This article develops semiparametric approaches for estimation of propensity scores and causal survival functions from prevalent survival data. The analytical problem arises when the prevalent sampling is adopted for collecting failure times and, as a result, the covariates are incompletely observed due to their association with failure time. The proposed procedure for estimating propensity scores shares interesting features similar to the likelihood formulation in case-control study, but in our case it requires additional consideration in the intercept term. The result shows that the corrected propensity scores in logistic regression setting can be obtained through standard estimation procedure with specific adjustments on the intercept term. For causal estimation, two different types of missing sources are encountered in our model: one can be explained by potential outcome framework; the other is caused by the prevalent sampling scheme. Statistical analysis without adjusting bias from both sources of missingness will lead to biased results in causal inference. The proposed methods were partly motivated by and applied to the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked data for women diagnosed with breast cancer.
Case-control study; Prevalent sampling; Propensity scores
The quality of propensity scores is traditionally measured by assessing how well they make the distributions of covariates in the treatment and control groups match, which we refer to as “good balance”. Good balance guarantees less biased estimates of the treatment effect. However, the cost of achieving good balance is that the variance of the estimates increases due to a reduction in effective sample size, either through the introduction of propensity score weights or dropping cases when propensity score matching. In this paper, we investigate whether it is best to optimize the balance or to settle for a less than optimal balance and use double robust estimation to adjust for remaining differences. We compare treatment effect estimates from regression, propensity score weighting, and double robust estimation with varying levels of effort expended to achieve balance using data from a study about the differences in outcomes by HIV status in heterosexually active homeless men residing in Los Angeles. Because of how costly data collection efforts are for this population, it is important to find an alternative estimation method that does not reduce effective sample size as much as methods that aggressively aim to optimize balance. Results from a simulation study suggest that there are instances in which we can obtain more precise treatment effect estimates without increasing bias too much by using a combination of regression and propensity score weights that achieve a less than optimal balance. There is a bias-variance tradeoff at work in propensity score estimation; every step toward better balance usually means an increase in variance and at some point a marginal decrease in bias may not be worth the associated increase in variance.
Propensity score; Double robust estimation; HIV status; Homeless men
The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
For nonnegative measurements such as income or sick days, zero counts often have special status. Furthermore, the incidence of zero counts is often greater than expected for the Poisson model. This article considers a doubly semiparametric zero-inflated Poisson model to fit data of this type, which assumes two partially linear link functions in both the mean of the Poisson component and the probability of zero. We study a sieve maximum likelihood estimator for both the regression parameters and the nonparametric functions. We show, under routine conditions, that the estimators are strongly consistent. Moreover, the parameter estimators are asymptotically normal and first-order efficient, while the nonparametric components achieve the optimal convergence rates. Simulation studies suggest that the extra flexibility inherent from the doubly semiparametric model is gained with little loss in statistical efficiency. We also illustrate our approach with a dataset from a public health study.
Asymptotic efficiency; Partly linear model; Sieve maximum likelihood estimator; Zero-inflated Poisson model
Lack of randomization of nursing intervention in outcome effectiveness studies may lead to imbalanced covariates. Consequently, estimation of nursing intervention effect can be biased as in other observational studies. Propensity score analysis is an effective statistical method to reduce such bias and further derive causal effects in observational studies.
To illustrate the use of propensity score analysis in quantitative nursing research through an example of pain management effect on length of hospital stay.
Propensity scores are generated through a regression model treating the nursing intervention as the dependent variable and all confounding covariates as predictor variables. Then propensity scores are used to adjust for this nonrandomized assignment of nursing intervention through three approaches: regression covariance adjustment, stratification, and matching in the predictive outcome model for nursing intervention.
Propensity score analysis reduces the confounding covariates into a single variable of propensity score. After stratification and matching on propensity scores, observed covariates between nursing intervention groups are more balanced within each stratum or in the matched samples. The likelihood of receiving pain management is accounted for in the outcome model through the propensity scores. Both regression covariance adjustment and matching methods report a significant pain management effect on length of hospital stay in this example. The pain management effect can be regarded as causal when the strongly ignorable treatment assignment assumption holds.
Propensity score analysis provides an alternative statistical approach to the classical multivariate regression, stratification and matching techniques for examining the effects of nursing intervention with a large number of confounding covariates in the background. It can be used to derive causal effects of nursing intervention in observational studies under certain circumstances.
matching; nursing effectiveness research; nursing interventions; propensity score
Two approaches commonly used to deal with missing data are multiple
imputation (MI) and inverse-probability weighting (IPW). IPW is also used to
adjust for unequal sampling fractions. MI is generally more efficient than
IPW but more complex. Whereas IPW requires only a model for the probability
that an individual has complete data (a univariate outcome), MI needs a
model for the joint distribution of the missing data (a multivariate
outcome) given the observed data. Inadequacies in either model may lead to
important bias if large amounts of data are missing. A third approach
combines MI and IPW to give a doubly robust estimator. A fourth approach
(IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only
isolated missing values and uses weights to account for remaining larger
blocks of unimputed missing data, such as would arise, e.g., in a cohort
study subject to sample attrition, and/or unequal sampling fractions. In
this article, we examine the performance, in terms of bias and efficiency,
of IPW/MI relative to MI and IPW alone and investigate whether the
Rubin’s rules variance estimator is valid for IPW/MI. We prove that
the Rubin’s rules variance estimator is valid for IPW/MI for linear
regression with an imputed outcome, we present simulations supporting the
use of this variance estimator in more general settings, and we demonstrate
that IPW/MI can have advantages over alternatives. IPW/MI is applied to data
from the National Child Development Study.
Marginal model; Missing at random; Survey weighting; 1958 British Birth Cohort
We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study.
Attributable risk; Causal inference; Confounding; Counterfactual; Doubly-robust estimation; G-computation estimation; Inverse-probability-of-treatment-weighted estimation
In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. Copyright © 2010 John Wiley & Sons, Ltd.
propensity score; observational study; binary data; risk difference; propensity-score matching; Monte Carlo simulations; bias; matching
In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.
Auxiliary covariate; High-dimensional data; Kernel estimation; Missing at random; Semiparametric regression
Propensity-score matching is frequently used in the medical literature to reduce or eliminate the effect of treatment selection bias when estimating the effect of treatments or exposures on outcomes using observational data. In propensity-score matching, pairs of treated and untreated subjects with similar propensity scores are formed. Recent systematic reviews of the use of propensity-score matching found that the large majority of researchers ignore the matched nature of the propensity-score matched sample when estimating the statistical significance of the treatment effect. We conducted a series of Monte Carlo simulations to examine the impact of ignoring the matched nature of the propensity-score matched sample on Type I error rates, coverage of confidence intervals, and variance estimation of the treatment effect. We examined estimating differences in means, relative risks, odds ratios, rate ratios from Poisson models, and hazard ratios from Cox regression models. We demonstrated that accounting for the matched nature of the propensity-score matched sample tended to result in type I error rates that were closer to the advertised level compared to when matching was not incorporated into the analyses. Similarly, accounting for the matched nature of the sample tended to result in confidence intervals with coverage rates that were closer to the nominal level, compared to when matching was not taken into account. Finally, accounting for the matched nature of the sample resulted in estimates of standard error that more closely reflected the sampling variability of the treatment effect compared to when matching was not taken into account.
propensity score; matching; propensity-score matching; variance estimation; coverage; simulations; type I error; observational studies
Propensity score weighting is sensitive to model misspecification and outlying weights that can unduly influence results. The authors investigated whether trimming large weights downward can improve the performance of propensity score weighting and whether the benefits of trimming differ by propensity score estimation method. In a simulation study, the authors examined the performance of weight trimming following logistic regression, classification and regression trees (CART), boosted CART, and random forests to estimate propensity score weights. Results indicate that although misspecified logistic regression propensity score models yield increased bias and standard errors, weight trimming following logistic regression can improve the accuracy and precision of final parameter estimates. In contrast, weight trimming did not improve the performance of boosted CART and random forests. The performance of boosted CART and random forests without weight trimming was similar to the best performance obtainable by weight trimmed logistic regression estimated propensity scores. While trimming may be used to optimize propensity score weights estimated using logistic regression, the optimal level of trimming is difficult to determine. These results indicate that although trimming can improve inferences in some settings, in order to consistently improve the performance of propensity score weighting, analysts should focus on the procedures leading to the generation of weights (i.e., proper specification of the propensity score model) rather than relying on ad-hoc methods such as weight trimming.
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV.
Dimension reduction; Inverse probability weighting; Kernel regression; Missing at random; Robustness to model misspecification
Theory on semiparametric efficient estimation in missing data problems has been systematically developed by Robins and his coauthors. Except in relatively simple problems, semiparametric efficient scores cannot be expressed in closed forms. Instead, the efficient scores are often expressed as solutions to integral equations. Neumann series was proposed in the form of successive approximation to the efficient scores in those situations. Statistical properties of the estimator based on the Neumann series approximation are difficult to obtain and as a result, have not been clearly studied. In this paper, we reformulate the successive approximation in a simple iterative form and study the statistical properties of the estimator based on the reformulation. We show that a doubly-robust locally-efficient estimator can be obtained following the algorithm in robustifying the likelihood score. The results can be applied to, among others, the parametric regression, the marginal regression, and the Cox regression when data are subject to missing values and the missing data are missing at random. A simulation study is conducted to evaluate the performance of the approach and a real data example is analyzed to demonstrate the use of the approach.
auxiliary covariates; information operator; non-monotone missing pattern; weighted estimating equations
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).
Doubly robust; Generalized odds ratio; Locally efficient; Semiparametric logistic regression
Mediation is usually assessed by a regression-based or structural equation modeling (SEM) approach that we will refer to as the classical approach. This approach relies on the assumption that there are no confounders that influence both the mediator, M, and the outcome, Y. This assumption holds if individuals are randomly assigned to levels of M but generally random assignment is not possible. We propose the use of propensity scores to help remove the selection bias that may result when individuals are not randomly assigned to levels of M. The propensity score is the probability that an individual receives a particular level of M. Results from a simulation study are presented to demonstrate this approach, referred to as Classical + Propensity Model (C+PM), confirming that the population parameters are recovered and that selection bias is successfully dealt with. Comparisons are made to the classical approach that does not include propensity scores. Propensity scores were estimated by a logistic regression model. If all confounders are included in the propensity model, then the C+PM is unbiased. If some, but not all, of the confounders are included in the propensity model, then the C+PM estimates are biased although not as severely as the classical approach (i.e. no propensity model is included).
Regression adjustment for the propensity score is a statistical method that reduces confounding from measured variables in observational data. A Bayesian propensity score analysis extends this idea by using simultaneous estimation of the propensity scores and the treatment effect. In this article, we conduct an empirical investigation of the performance of Bayesian propensity scores in the context of an observational study of the effectiveness of beta-blocker therapy in heart failure patients. We study the balancing properties of the estimated propensity scores. Traditional Frequentist propensity scores focus attention on balancing covariates that are strongly associated with treatment. In contrast, we demonstrate that Bayesian propensity scores can be used to balance the association between covariates and the outcome. This balancing property has the effect of reducing confounding bias because it reduces the degree to which covariates are outcome risk factors.
The applied literature on propensity scores has often cited the c-statistic as a measure of the ability of the propensity score to control confounding. However, a high c-statistic in the propensity model is neither necessary nor sufficient for control of confounding. Moreover, use of the c-statistic as a guide in constructing propensity scores may result in less overlap in propensity scores between treated and untreated subjects; this may require the analyst to restrict populations for inference. Such restrictions may reduce precision of estimates and change the population to which the estimate applies. Variable selection based on prior subject matter knowledge, empirical observation, and sensitivity analysis is preferable and avoids many of these problems.
Propensity scores; c-statistic; variable selection; confounding