Recent overviews have described the use of propensity scores in medical research and compared estimates of relationships between exposures and outcomes obtained from propensity score methods to those obtained from multivariate models10,11
. A systematic literature search11
found an exponential increase in use of propensity scores over the past several years (see ). From a baseline with between 6 and 9 published papers using these methods between 1998 and 2000, annual numbers of publications using propensity score methods increased to 39, 51 and 71 in 2001, 2002 and 2003, respectively. Among 177 published studies that used propensity score methods to evaluate the relationship of a dichotomous exposure with an outcome, medications were the most common treatment studied (34% of studies) followed by surgical interventions (28%), interventional catheterization (7%), and other medical procedures and lifestyle factors.
Frequency of Publications Using Propensity Score Methods by Year
The reason for the sharp increase in use of propensity scores over the past few years is unclear. Possibly, frequently cited presentations to clinical audiences and researchers1,12
have influenced use.
Published studies have increasingly used both propensity score methods and regression models to evaluate the relationship between an exposure and an outcome, and reviews have compared estimates in these settings10,11
. A limitation of comparisons between estimates from conventional multivariate models and those based on control for the propensity score is that the approaches used to model confounding variables and the methods of construction and modeling of the propensity score vary widely across studies and are sometimes not fully described. Nonetheless, comparisons of estimated effects of drugs from multivariate models versus propensity score analysis can shed light on the performance of these approaches in real applications. Among 78 exposure-outcome associations in 43 studies evaluated both by propensity scores and regression models, statistical significance differed between the two methods in only 8 (10%) cases10
. The propensity score methods tended to give estimates slightly closer to the null. Another comparison of 69 studies that reported results from both propensity score and regression model approaches found only 9 (13%) to have all propensity score estimates differing by more than 20% from regression model estimates11
. Thus, there is little evidence for substantially different answers between propensity score and regression model estimates in actual usage.
Simulation studies offer the ability to compare analytic approaches in a setting where true relationships are known. Cook and Goldman compared estimates based on propensity scores, regression model and disease risk scores and found generally comparable performance of the three methods14
. They noted exaggerated levels of statistical significance in analyses based on propensity scores and disease risk scores in settings with a high correlation between exposure and confounders. Generally, propensity score methods displayed greater robustness to such high correlations than disease risk scores.
Cepeda and colleagues focused their simulation studies on the setting with small numbers of events relative to the number of potential confounders15
. This is particularly relevant to pharmacoepidemiology where one often studies rare outcomes that occur in patients with multiple risk factors and many possible indications and contra-indications for drug use. They found that with fewer than eight events per confounder, analysis based on propensity scores yielded estimates that were less biased, more robust, and more precise than a regression approach based on logistic regression. By contrast, propensity score methods had poorer coverage than regression methods with larger numbers of events per confounder. These results are entirely consistent with the known poor performance of regression models with small numbers of events per variable16
, and indicate an important situation where propensity score methods are clearly preferred17
Another important topic evaluated in simulation studies is the impact of omitted covariates on the performance of estimates based on the propensity score. Often, available databases with detailed information on drug use either lack information on an important covariate or can only measure it crudely. Drake2
showed that omitted covariates yield comparable bias in estimates based on propensity scores relative to those based on regression models. She further demonstrated that failure to specify the response model correctly induces greater bias than incorrect specification of the propensity score and that the propensity score does not yield balance in the distributions of omitted covariates between treated and untreated subjects.