# Related Articles

Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.

doi:10.1080/00273171.2011.540480

PMCID: PMC3266945
PMID: 22287812

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

doi:10.1080/00273171.2011.568786

PMCID: PMC3144483
PMID: 21818162

Background

Lack of randomization of nursing intervention in outcome effectiveness studies may lead to imbalanced covariates. Consequently, estimation of nursing intervention effect can be biased as in other observational studies. Propensity score analysis is an effective statistical method to reduce such bias and further derive causal effects in observational studies.

Objectives

To illustrate the use of propensity score analysis in quantitative nursing research through an example of pain management effect on length of hospital stay.

Methods

Propensity scores are generated through a regression model treating the nursing intervention as the dependent variable and all confounding covariates as predictor variables. Then propensity scores are used to adjust for this nonrandomized assignment of nursing intervention through three approaches: regression covariance adjustment, stratification, and matching in the predictive outcome model for nursing intervention.

Results

Propensity score analysis reduces the confounding covariates into a single variable of propensity score. After stratification and matching on propensity scores, observed covariates between nursing intervention groups are more balanced within each stratum or in the matched samples. The likelihood of receiving pain management is accounted for in the outcome model through the propensity scores. Both regression covariance adjustment and matching methods report a significant pain management effect on length of hospital stay in this example. The pain management effect can be regarded as causal when the strongly ignorable treatment assignment assumption holds.

Discussion

Propensity score analysis provides an alternative statistical approach to the classical multivariate regression, stratification and matching techniques for examining the effects of nursing intervention with a large number of confounding covariates in the background. It can be used to derive causal effects of nursing intervention in observational studies under certain circumstances.

doi:10.1097/NNR.0b013e31818c66f6

PMCID: PMC2778306
PMID: 19018219

matching; nursing effectiveness research; nursing interventions; propensity score

The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile–quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.

doi:10.1002/sim.3697

PMCID: PMC3472075
PMID: 19757444

balance; goodness-of-fit; observational study; propensity score; matching; propensity-score matching; standardized difference; bias

Propensity-score matching is frequently used in the medical literature to reduce or eliminate the effect of treatment selection bias when estimating the effect of treatments or exposures on outcomes using observational data. In propensity-score matching, pairs of treated and untreated subjects with similar propensity scores are formed. Recent systematic reviews of the use of propensity-score matching found that the large majority of researchers ignore the matched nature of the propensity-score matched sample when estimating the statistical significance of the treatment effect. We conducted a series of Monte Carlo simulations to examine the impact of ignoring the matched nature of the propensity-score matched sample on Type I error rates, coverage of confidence intervals, and variance estimation of the treatment effect. We examined estimating differences in means, relative risks, odds ratios, rate ratios from Poisson models, and hazard ratios from Cox regression models. We demonstrated that accounting for the matched nature of the propensity-score matched sample tended to result in type I error rates that were closer to the advertised level compared to when matching was not incorporated into the analyses. Similarly, accounting for the matched nature of the sample tended to result in confidence intervals with coverage rates that were closer to the nominal level, compared to when matching was not taken into account. Finally, accounting for the matched nature of the sample resulted in estimates of standard error that more closely reflected the sampling variability of the treatment effect compared to when matching was not taken into account.

doi:10.2202/1557-4679.1146

PMCID: PMC2949360
PMID: 20949126

propensity score; matching; propensity-score matching; variance estimation; coverage; simulations; type I error; observational studies

In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. Copyright © 2010 John Wiley & Sons, Ltd.

doi:10.1002/pst.433

PMCID: PMC3120982
PMID: 20925139

propensity score; observational study; binary data; risk difference; propensity-score matching; Monte Carlo simulations; bias; matching

Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1–5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; conversely, increasing the number of untreated subjects matched to each treated subject decreased the sampling variability of the estimated treatment effect. Using nearest-neighbor matching, the mean squared error of the estimated treatment effect was minimized in 67.7% of the scenarios when 1:1 matching was used. Using nearest-neighbor matching or caliper matching, the mean squared error was minimized in approximately 84% of the scenarios when, at most, 2 untreated subjects were matched to each treated subject. The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching.

doi:10.1093/aje/kwq224

PMCID: PMC2962254
PMID: 20802241

bias (epidemiology); matching; Monte Carlo method; observational study; propensity score

Individuals differ not only in their background characteristics, but also in how they respond to a particular treatment, intervention, or stimulation. In particular, treatment effects may vary systematically by the propensity for treatment. In this paper, we discuss a practical approach to studying heterogeneous treatment effects as a function of the treatment propensity, under the same assumption commonly underlying regression analysis: ignorability. We describe one parametric method and two non-parametric methods for estimating interactions between treatment and the propensity for treatment. For the first method, we begin by estimating propensity scores for the probability of treatment given a set of observed covariates for each unit and construct balanced propensity score strata; we then estimate propensity score stratum-specific average treatment effects and evaluate a trend across them. For the second method, we match control units to treated units based on the propensity score and transform the data into treatment-control comparisons at the most elementary level at which such comparisons can be constructed; we then estimate treatment effects as a function of the propensity score by fitting a non-parametric model as a smoothing device. For the third method, we first estimate non-parametric regressions of the outcome variable as a function of the propensity score separately for treated units and for control units and then take the difference between the two non-parametric regressions. We illustrate the application of these methods with an empirical example of the effects of college attendance on womens fertility.

PMCID: PMC3591476
PMID: 23482633

causal effects; treatment effects; heterogeneity; propensity scores; matching

Summary

Methods based on the propensity score comprise one set of valuable tools for comparative effectiveness research and for estimating causal effects more generally. These methods typically consist of two distinct stages: 1) a propensity score stage where a model is fit to predict the propensity to receive treatment (the propensity score), and 2) an outcome stage where responses are compared in treated and untreated units having similar values of the estimated propensity score. Traditional techniques conduct estimation in these two stages separately; estimates from the first stage are treated as fixed and known for use in the second stage. Bayesian methods have natural appeal in these settings because separate likelihoods for the two stages can be combined into a single joint likelihood, with estimation of the two stages carried out simultaneously. One key feature of joint estimation in this context is “feedback” between the outcome stage and the propensity score stage, meaning that quantities in a model for the outcome contribute information to posterior distributions of quantities in the model for the propensity score. We provide a rigorous assessment of Bayesian propensity score estimation to show that model feedback can produce poor estimates of causal effects absent strategies that augment propensity score adjustment with adjustment for individual covariates. We illustrate this phenomenon with a simulation study and with a comparative effectiveness investigation of carotid artery stenting vs. carotid endarterectomy among 123,286 Medicare beneficiaries hospitlized for stroke in

doi:10.1111/j.1541-0420.2012.01830.x

PMCID: PMC3622139
PMID: 23379793

Bayesian estimation; causal inference; comparative effectiveness; model feedback; propensity score

Propensity score methods are increasingly being used to estimate the effects of treatments on health outcomes using observational data. There are four methods for using the propensity score to estimate treatment effects: covariate adjustment using the propensity score, stratification on the propensity score, propensity-score matching, and inverse probability of treatment weighting (IPTW) using the propensity score. When outcomes are binary, the effect of treatment on the outcome can be described using odds ratios, relative risks, risk differences, or the number needed to treat. Several clinical commentators suggested that risk differences and numbers needed to treat are more meaningful for clinical decision making than are odds ratios or relative risks. However, there is a paucity of information about the relative performance of the different propensity-score methods for estimating risk differences. We conducted a series of Monte Carlo simulations to examine this issue. We examined bias, variance estimation, coverage of confidence intervals, mean-squared error (MSE), and type I error rates. A doubly robust version of IPTW had superior performance compared with the other propensity-score methods. It resulted in unbiased estimation of risk differences, treatment effects with the lowest standard errors, confidence intervals with the correct coverage rates, and correct type I error rates. Stratification, matching on the propensity score, and covariate adjustment using the propensity score resulted in minor to modest bias in estimating risk differences. Estimators based on IPTW had lower MSE compared with other propensity-score methods. Differences between IPTW and propensity-score matching may reflect that these two methods estimate the average treatment effect and the average treatment effect for the treated, respectively. Copyright © 2010 John Wiley & Sons, Ltd.

doi:10.1002/sim.3854

PMCID: PMC3068290
PMID: 20108233

propensity score; observational study; binary data; risk difference; number needed to treat; matching; IPTW; inverse probability of treatment weighting; propensity-score matching

The quality of propensity scores is traditionally measured by assessing how well they make the distributions of covariates in the treatment and control groups match, which we refer to as “good balance”. Good balance guarantees less biased estimates of the treatment effect. However, the cost of achieving good balance is that the variance of the estimates increases due to a reduction in effective sample size, either through the introduction of propensity score weights or dropping cases when propensity score matching. In this paper, we investigate whether it is best to optimize the balance or to settle for a less than optimal balance and use double robust estimation to adjust for remaining differences. We compare treatment effect estimates from regression, propensity score weighting, and double robust estimation with varying levels of effort expended to achieve balance using data from a study about the differences in outcomes by HIV status in heterosexually active homeless men residing in Los Angeles. Because of how costly data collection efforts are for this population, it is important to find an alternative estimation method that does not reduce effective sample size as much as methods that aggressively aim to optimize balance. Results from a simulation study suggest that there are instances in which we can obtain more precise treatment effect estimates without increasing bias too much by using a combination of regression and propensity score weights that achieve a less than optimal balance. There is a bias-variance tradeoff at work in propensity score estimation; every step toward better balance usually means an increase in variance and at some point a marginal decrease in bias may not be worth the associated increase in variance.

PMCID: PMC3433039
PMID: 22956891

Propensity score; Double robust estimation; HIV status; Homeless men

Pawar, Pushkar P. | Jones, Linda G. | Feller, Margaret | Guichard, Jason L. | Mujib, Marjan | Ahmed, Mustafa I. | Roy, Brita | Rahman, Toufiqur | Aban, Inmaculada B. | Love, Thomas E. | White, Michel | Aronow, Wilbert S. | Fonarow, Gregg C. | Ahmed, Ali
Tobacco smoking is a risk factor for atrial fibrillation (AF), but little is known about the impact of smoking in patients with AF. Of the 4060 patients with recurrent AF in the Atrial Fibrillation Follow-up Investigation of Rhythm Management (AFFIRM) trial, 496 (12%) reported having smoked during the past two years. Propensity scores for smoking were estimated for each of the 4060 patients using a multivariable logistic regression model and were used to assemble a matched cohort of 487 pairs of smokers and nonsmokers, who were balanced on 46 baseline characteristics. Cox and logistic regression models were used to estimate the associations of smoking with all-cause mortality and all-cause hospitalization, respectively, during over 5 years of follow-up. Matched participants had a mean age of 70 ± 9 years (± S.D.), 39% were women, and 11% were non-white. All-cause mortality occurred in 21% and 16% of matched smokers and nonsmokers, respectively (when smokers were compared with nonsmokers, hazard ratio = HR = 1.35; 95% confidence interval = 95% CI = 1.01–1.81; p = 0.046). Unadjusted, multivariable-adjusted and propensity-adjusted HR (95% CI) for all-cause mortality associated with smoking in the pre-match cohort were: 1.40 (1.13–1.72; p = 0.002), 1.45 (1.16–1.81; p = 0.001), and 1.39 (1.12–1.74; p = 0.003), respectively. Smoking had no association with all-cause hospitalization (when smokers were compared with nonsmokers, odds ratio = OR = 1.21; 95% CI = 0.94–1.57, p = 0.146). Among patients with AF, a recent history of smoking was associated with an increased risk of all-cause mortality, but had no association with all-cause hospitalization.

doi:10.1016/j.archger.2011.05.027

PMCID: PMC3358565
PMID: 21733581

Atrial fibrillation; Smoking; Mortality; Propensity score

A number of covariate-balancing methods, based on the propensity score, are widely used to estimate treatment effects in observational studies. If the treatment effect varies with the propensity score, however, different methods can give very different answers. The authors illustrate this effect by using data from a United Kingdom–based registry of subjects treated with anti–tumor necrosis factor drugs for rheumatoid arthritis. Estimates of the effect of these drugs on mortality varied from a relative risk of 0.4 (95% confidence interval: 0.16, 0.91) to a relative risk of 1.3 (95% confidence interval: 0.8, 2.25), depending on the balancing method chosen. The authors show that these differences were due to a combination of an interaction between propensity score and treatment effect and to differences in weighting subjects with different propensity scores. Thus, the methods are being used to calculate average treatment effects in populations with very different distributions of effect-modifying variables, resulting in different overall estimates. This phenomenon highlights the importance of careful selection of the covariate-balancing method so that the overall estimate has a meaningful interpretation.

doi:10.1093/aje/kwn391

PMCID: PMC2656533
PMID: 19153216

covariate balance; effect modification; observational study; propensity score; weighting

Background

Hospitalization due to worsening heart failure (HF) is common and is associated with high mortality. However, the effect of incident HF hospitalization (compared to no HF hospitalization) on subsequent mortality has not been studied in a propensity-matched population of chronic HF patients.

Methods and Results

In the Digitalis Investigation Group trial, 5501 patients had no HF hospitalizations (4512 alive at two years after randomization) and 1732 had HF hospitalizations during the first two years (1091 alive at two years). Propensity scores for incident HF hospitalization during the first two years after randomization were calculated for each patient, and were used to match 1057 (97%) patients who had two-year HF hospitalization with 1057 patients who had no HF hospitalization. We used matched Cox regression analysis to estimate the effect of incident HF hospitalization during the first two years after randomization on post-two-year mortality. Compared with 153 deaths (rate, 420/10,000 person-years) in the no HF hospitalization group, 334 deaths (rate, 964/10,000 person-years) occurred in the HF hospitalization group (hazard ratio 2.49; 95% confidence interval 1.97–3.13; p<0.0001). Respective hazard ratios (95% confidence intervals) for cardiovascular and HF mortality were respectively 2.88 (2.23–3.74; p <0.0001) and 5.22 (3.34–8.15; p <0.0001).

Conclusions

Hospitalization due to worsening HF was associated with increased risk of subsequent mortality in ambulatory patients with chronic HF. These results highlight the importance of HF hospitalization as a marker of disease progression and poor outcomes in chronic HF, reinforcing the need for prevention of HF hospitalizations and strategies to improve post-discharge outcomes.

doi:10.1016/j.cardfail.2007.12.001

PMCID: PMC2771194
PMID: 18381184

Heart failure; hospitalization; mortality; propensity scores

Ekundayo, O. James | Adamopoulos, Chris | Ahmed, Mustafa I. | Pitt, Bertram | Young, James B. | Fleg, Jerome L. | Love, Thomas E. | Sui, Xuemei | Perry, Gilbert J. | Siscovick, David S. | Bakris, George | Ahmed, Ali
Background

Hypokalemia is common in heart failure (HF) and is associated with increased mortality. Potassium supplements are commonly used to treat hypokalemia and maintain normokalemia. However, their long-term effects on outcomes in chronic HF are unknown. We used a public-use copy of the Digitalis Investigation Group (DIG) trial dataset to determine the associations of potassium supplement use with outcomes using a propensity-matched design.

Methods

Of the 7788 DIG participants with chronic HF, 2199 were using oral potassium supplements at baseline. We estimated propensity scores for potassium supplement use for each patient and used them to match 2131 pairs of patients receiving and not receiving potassium supplements. Matched Cox regression models were used to estimate associations of potassium supplement use with mortality and hospitalization during 40 months of median follow-up.

Results

All-cause mortality occurred in 818 (rate, 1327/10000 person-years) and 802 (rate, 1313/10000 person-years) patients respectively receiving and not receiving potassium supplements (hazard ratio {HR} when potassium supplement use was compared with nonuse, 1.05; 95% confidence interval {CI}, 0.94–1.18; P=0.390). All-cause hospitalizations occurred in 1516 (rate, 4777/10,000 person-years) and 1445 (rate, 4120/10,000 person-years) patients respectively receiving and not receiving potassium supplements (HR, 1.15; 95% CI, 1.05–1.26; P=0.004). HR (95% CI) for hospitalizations due to cardiovascular causes and worsening HF were respectively 1.19 (95% CI, 1.08–1.32; P=0.001) and 1.27 (1.12–1.43; P<0.0001).

Conclusion

The use of potassium supplements in chronic HF was not associated with mortality. However, their use was associated with increased hospitalization due to cardiovascular causes and progressive HF.

doi:10.1016/j.ijcard.2008.11.195

PMCID: PMC2900187
PMID: 19135741

Heart failure; potassium supplement; mortality; hospitalization; propensity score

Mediation analysis uses measures of hypothesized mediating variables to test theory for how a treatment achieves effects on outcomes and to improve subsequent treatments by identifying the most efficient treatment components. Most current mediation analysis methods rely on untested distributional and functional form assumptions for valid conclusions, especially regarding the relation between the mediator and outcome variables. Propensity score methods offer an alternative whereby the propensity score is used to compare individuals in the treatment and control groups who would have had the same value of the mediator had they been assigned to the same treatment condition. This article describes the use of propensity score weighting for mediation with a focus on explicating the underlying assumptions. Propensity scores have the potential to offer an alternative estimation procedure for mediation analysis with alternative assumptions from those of standard mediation analysis. The methods are illustrated investigating the mediational effects of an intervention to improve sense of mastery to reduce depression using data from the Job Search Intervention Study (JOBS II). We find significant treatment effects for those individuals who would have improved sense of mastery when in the treatment condition but no effects for those who would not have improved sense of mastery under treatment.

doi:10.1080/00273171.2011.576624

PMCID: PMC3293166
PMID: 22399826

Background

This research examined the use of the propensity score method to compare proxy-completed responses to self-completed responses in the first three baseline cohorts of the Medicare Health Outcomes Survey, administered in 1998, 1999, and 2000, respectively. A proxy is someone other than the respondent who completes the survey for the respondent.

Methods

The propensity score method of matched sampling was used to compare proxy and self-completed responses. A propensity score is a value that equals the estimated probability of a given individual belonging to a treatment group given the observed background characteristics of that individual. Proxy and self-completed responses were compared on demographics, the SF-36, chronic conditions, activities of daily living, and depression-screening questions. For each individual survey respondent, logistic regression was used to calculate the probability that this individual belonged to the proxy respondent group (propensity score). Pre and post adjustment comparisons were tested by calculating effect sizes.

Results

Differences between self and proxy-completed responses were substantially reduced with the use of the propensity score method. However, differences were still found in the SF-36, several demographics, several impaired activities of daily living, several chronic conditions, and one depression-screening question.

Conclusion

The propensity score method helped to reduce differences between proxy-completed and self-completed survey responses, thereby providing an approximation to a randomized controlled experiment of proxy-completed versus self-completed survey responses.

doi:10.1186/1477-7525-1-47

PMCID: PMC222919
PMID: 14570594

Propensity score; Medicare Health Outcomes Survey; elderly; proxy

Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring. Copyright © 2011 John Wiley & Sons, Ltd.

doi:10.1002/sim.4200

PMCID: PMC3110307
PMID: 21337595

propensity score; propensity-score matching; risk difference; absolute risk reduction; Monte Carlo simulations; statistical inference; hypothesis testing; type I error rate; categorical data analysis

Context

Comparisons of outcomes between patients treated and untreated in observational studies may be biased due to differences in patient prognosis between groups, often because of unobserved treatment selection biases.

Objective

To compare 4 analytic methods for removing the effects of selection bias in observational studies: multivariable model risk adjustment, propensity score risk adjustment, propensity-based matching, and instrumental variable analysis.

Design, Setting, and Patients

A national cohort of 122 124 patients who were elderly (aged 65–84 years), receiving Medicare, and hospitalized with acute myocardial infarction (AMI) in 1994–1995, and who were eligible for cardiac catheterization. Baseline chart reviews were taken from the Cooperative Cardiovascular Project and linked to Medicare health administrative data to provide a rich set of prognostic variables. Patients were followed up for 7 years through December 31, 2001, to assess the association between long-term survival and cardiac catheterization within 30 days of hospital admission.

Main Outcome Measure

Risk-adjusted relative mortality rate using each of the analytic methods.

Results

Patients who received cardiac catheterization (n=73 238) were younger and had lower AMI severity than those who did not. After adjustment for prognostic factors by using standard statistical risk-adjustment methods, cardiac catheterization was associated with a 50% relative decrease in mortality (for multivariable model risk adjustment: adjusted relative risk [RR], 0.51; 95% confidence interval [CI], 0.50–0.52; for propensity score risk adjustment: adjusted RR, 0.54; 95% CI, 0.53–0.55; and for propensity-based matching: adjusted RR, 0.54; 95% CI, 0.52–0.56). Using regional catheterization rate as an instrument, instrumental variable analysis showed a 16% relative decrease in mortality (adjusted RR, 0.84; 95% CI, 0.79–0.90). The survival benefits of routine invasive care from randomized clinical trials are between 8% and 21 %.

Conclusions

Estimates of the observational association of cardiac catheterization with long-term AMI mortality are highly sensitive to analytic method. All standard risk-adjustment methods have the same limitations regarding removal of unmeasured treatment selection biases. Compared with standard modeling, instrumental variable analysis may produce less biased estimates of treatment effects, but is more suited to answering policy questions than specific clinical questions.

doi:10.1001/jama.297.3.278

PMCID: PMC2170524
PMID: 17227979

Group care programs are often criticized for producing poor outcomes, especially in light of community-based alternatives like treatment foster care that have a stronger evidence base. In this study, data from Girls and Boys Town were used to compare outcomes of youth in treatment foster care (n=112) and group care (n=716) using propensity score matching, a method that can minimize selection bias in nonrandomized designs. Eighteen background covariates were used to develop propensity scores for the likelihood of receiving treatment foster care rather than group care. Several matching methods generated balanced samples on which the outcomes were compared. Results found that group care youth were more likely to be favorably discharged, more likely to return home, and less likely to experience subsequent placement in the first six months after discharge. Legal involvement and residing in a home-like environment at follow-up did not differ. Positive outcomes for group care youth suggest that family-style group care programs may promote effectiveness.

doi:10.1016/j.childyouth.2007.12.002

PMCID: PMC2515489
PMID: 19122763

treatment foster care; family-style group care; propensity score matching

Methods for estimating average treatment effects, under the assumption of no unmeasured confounders, include regression models; propensity score adjustments using stratification, weighting, or matching; and doubly robust estimators (a combination of both). Researchers continue to debate about the best estimator for outcomes such as health care cost data, as they are usually characterized by an asymmetric distribution and heterogeneous treatment effects,. Challenges in finding the right specifications for regression models are well documented in the literature. Propensity score estimators are proposed as alternatives to overcoming these challenges. Using simulations, we find that in moderate size samples (n= 5000), balancing on propensity scores that are estimated from saturated specifications can balance the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates. Therefore, unlike regression model, even if a formal model for outcomes is not required, propensity score estimators can be inefficient at best and biased at worst for health care cost data. Our simulation study, designed to take a ‘proof by contradiction’ approach, proves that no one estimator can be considered the best under all data generating processes for outcomes such as costs. The inverse-propensity weighted estimator is most likely to be unbiased under alternate data generating processes but is prone to bias under misspecification of the propensity score model and is inefficient compared to an unbiased regression estimator. Our results show that there are no ‘magic bullets’ when it comes to estimating treatment effects in health care costs. Care should be taken before naively applying any one estimator to estimate average treatment effects in these data. We illustrate the performance of alternative methods in a cost dataset on breast cancer treatment.

doi:10.1007/s10742-011-0072-8

PMCID: PMC3244728
PMID: 22199462

Propensity score; non-linear regression; average treatment effect; health care costs

Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non-randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population-average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes. Copyright © 2012 John Wiley & Sons, Ltd.

doi:10.1002/sim.5705

PMCID: PMC3747460
PMID: 23239115

propensity score; survival analysis; inverse probability of treatment weighting (IPTW); Monte Carlo simulations; observational study; time-to-event outcomes

Health and psychosocial service needs that may be co-morbid with opioid addiction may impede the success of drug treatment among patients attending methadone maintenance treatment programs (MMTPs). This longitudinal panel study investigates whether receipt of services from one or more helping professionals outside of the MMTP confers a benefit for drug treatment outcomes among a random sample of male MMTP patients (N = 356). Each participant was interviewed 3 times, with 6 months between each interview. Since this observational study did not employ random assignment, propensity score matching was employed to strengthen causal validity of effect estimates. Results support hypotheses that receiving additional off-site services has significant beneficial effects in increasing the likelihood of abstaining from cocaine, heroin, and any illicit drug use over both the ensuing 6 month and 12 month time periods. These findings indicate that receipt of additional medical and/or psychosocial services enhances the efficacy of methadone treatment in increasing abstinence from illicit drug use.

doi:10.1016/j.evalprogplan.2009.11.001

PMCID: PMC2891366
PMID: 20034671

methadone; health services; drug abstinence; opioid-related disorders

Takara, Ayako | Ogawa, Hiroshi | Endoh, Yasuhiro | Mori, Fumiaki | Yamaguchi, Jun-ichi | Takagi, Atsushi | Koyanagi, Ryo | Shiga, Tsuyoshi | Kasanuki, Hiroshi | Hagiwara, Nobuhisa
Background

The long-term prognosis of diabetic patients with acute myocardial infarction (AMI) treated by acute revascularization is uncertain, and the optimal pharmacotherapy for such cases has not been fully evaluated.

Methods

To elucidate the long-term prognosis and prognostic factors in diabetic patients with AMI, a prospective, cohort study involving 3021 consecutive AMI patients was conducted. All patients discharged alive from hospital were followed to monitor their prognosis every year. The primary endpoint of the study was all-cause mortality, and the secondary endpoint was the occurrence of major cardiovascular events. To elucidate the effect of various factors on the long-term prognosis of AMI patients with diabetes, the patients were divided into two groups matched by propensity scores and analyzed retrospectively.

Results

Diabetes was diagnosed in 1102 patients (36.5%). During the index hospitalization, coronary angioplasty and coronary thrombolysis were performed in 58.1% and 16.3% of patients, respectively. In-hospital mortality of diabetic patients with AMI was comparable to that of non-diabetic AMI patients (9.2% and 9.3%, respectively). In total, 2736 patients (90.6%) were discharged alive and followed for a median of 4.2 years (follow-up rate, 96.0%). The long-term survival rate was worse in the diabetic group than in the non-diabetic group, but not significantly different (hazard ratio, 1.20 [0.97-1.49], p = 0.09). On the other hand, AMI patients with diabetes showed a significantly higher incidence of cardiovascular events than the non-diabetic group (1.40 [1.20-1.64], p < 0.0001). Multivariate analysis revealed that three factors were significantly associated with favorable late outcomes in diabetic AMI patients: acute revascularization (HR, 0.62); prescribing aspirin (HR, 0.27); and prescribing renin-angiotensin system (RAS) inhibitors (HR, 0.53). There was no significant correlation between late outcome and prescription of beta-blockers (HR, 0.97) or calcium channel blockers (HR, 1.27). Although standard Japanese-approved doses of statins were associated with favorable outcome in AMI patients with diabetes, this was not statistically significant (0.67 [0.39-1.06], p = 0.11).

Conclusions

Although diabetic patients with AMI have more frequent adverse events than non-diabetic patients with AMI, the present results suggest that acute revascularization and standard therapy with aspirin and RAS inhibitors may improve their prognosis.

doi:10.1186/1475-2840-9-1

PMCID: PMC2815698
PMID: 20047694

While newer antibiotics play a key role in treating methicillin-resistant Staphylococcus aureus (MRSA) infections, knowledge of their real-world clinical impact is limited. We sought to quantify the effectiveness of linezolid compared to that of vancomycin among MRSA-infected patients. This national retrospective cohort study included adult patients admitted to all Veterans Affairs hospitals between January 2002 and June 2008, infected with MRSA, and treated with either linezolid (oral or intravenous [i.v.]) or vancomycin (i.v.). Patients were followed from their treatment initiation date until the event of interest, discharge, death, or December 2008. Utilizing propensity score methods, we estimated the treatment effects of linezolid primarily on time to discharge and secondarily on time to all-cause in-hospital mortality, therapy discontinuation, and all-cause 90-day readmission with Cox proportional-hazard models. We identified 20,107 patients treated with linezolid (3.2%) or vancomycin (96.8%). Baseline covariates were well balanced by treatment group within propensity score quintiles and between propensity score matched patients (626 pairs). The discharge rate was significantly higher among patients treated with linezolid, representing a decreased length of stay, in both the propensity score adjusted (hazard ratio [HR], 1.38; 95% confidence interval [95% CI], 1.27 to 1.50) and matched (HR, 1.70; 95% CI, 1.44 to 2.00) analyses. A significantly decreased rate of therapy discontinuation, indicating longer therapy duration, was observed in the linezolid group (adjusted HR, 0.64; 95% CI, 0.54 to 0.75; matched HR, 0.49; 95% CI, 0.36 to 0.65). In this clinical population of MRSA-infected patients, linezolid therapy was as effective as vancomycin therapy with respect to in-hospital survival and readmission.

doi:10.1128/AAC.00200-10

PMCID: PMC2944576
PMID: 20660681