1. Theoretical advantages

While analyses based on propensity scores often give similar estimates to those from regression models, and the balance in observed covariates can give the false sense that unobserved covariates are also balanced, propensity scores offer important theoretical advantages in pharmacoepidemiology. Confounding by indication is often the main challenge to validity in pharmacoepidemiology and the propensity score focuses directly on the indications for use and non-use of the drug under study. Patients with contraindications to use of a drug (or those with absolute indications) may have no comparable exposed subjects (or unexposed subjects) for valid estimation of relative or absolute differences in outcomes. These subjects are not usually recognized with conventional response modeling and might be influential due to effect measure modification or model misspecification. Graphical comparison of propensity scores in exposed versus unexposed subjects can identify these areas of non-overlap that are otherwise difficult to describe in a multivariate setting with many factors influencing treatment decisions (see for an illustration).

The propensity score has direct scientific interest in studies that focus on determinants of drug initiation or persistence with therapy. Consideration of the propensity score can broaden one’s perspective to include barriers to treatment. For example, frailty and comorbidity that are difficult to measure in large databases can lead to decreased use of preventive drug therapies. Shown in are several markers of frailty and comorbidity that are related to decreased propensity to use lipid-lowering drugs among older residents of New Jersey. Recognition of the importance of such factors and their inclusion in propensity scores can lead to improved control for confounding, relative to an analysis that does not control for these factors. Further, understanding of the role such factors can play in drug use is of fundamental interest in pharmacoepidemiology and the propensity score naturally focuses on this issue.

| **Table 1**Correlates of lower propensity to use lipid-lowering drugs |

2. Value of propensity scores for matching or trimming the population

Matching or stratification on the propensity score offers several advantages relative to inclusion of an estimated linear propensity score in a conventional multivariable model. First, a matched analysis will eliminate those exposed subjects (e.g. those with absolute indications for therapy) with no comparable controls as well as those unexposed subjects with measurable contra-indications. Second, matched or stratified analyses do not make strong assumptions of linearity in the relationship of propensity with the outcome. Third, and perhaps most importantly, a matched data set allows for a simple, transparent analysis.

The balancing property of the propensity score has implications for optimal matching strategies in both cohort and cross-sectional studies in pharmacoepidemiology. Matching on the propensity score will outperform other matching strategies with many covariates in the sense that optimal balance of covariates will be achieved between exposed and unexposed groups^{18}. The balance achieved in prospective studies will mimic that of randomization but, of course, will hold only for variables that are measured and included in the propensity score.

A limitation of matching is that many unexposed subjects not matched to exposed subjects, and possibly some unmatched exposed subjects, are excluded from analysis and this can lead to a loss of information and a decrease in the precision of the estimated association between the drug and the outcome. As an alternative, one can trim the population for analysis through exclusion of those subjects in the two tails of the propensity score distribution where overlap between those who use and do not use the drug of interest may be limited. This can be viewed as a principled approach to eliminate extreme observations that may be unduly influential and problematic in a multivariate analysis because of minimal covariate overlap between exposed and unexposed subjects. The reduction of the population for analysis is appropriate if the excluded subjects are those who are not candidates for drug therapy, or possibly if the other tail of the distribution consists entirely of people with an absolute need for the drug. Although trimming has these theoretical advantages, optimal trimming strategies (e.g. exclusion of the extreme 1% or 2% of the propensity score distributions) are unknown.

3. Improved estimation with few outcomes

As previously noted, one common setting in pharmacoepidemiology where use of the propensity score can provide clearly improved estimates of drug effects occurs when one has relatively few outcomes compared with the number of potentially important covariates. In this setting, reliable estimation of many parameters in multivariate models is not possible because maximum likelihood estimation requires many outcomes per included parameter in a model^{19}. Use of the propensity score provides an effective way to reduce the dimensionality of the covariates before modeling. The rule of eight proposed by Cepeda et al (fewer than eight outcomes per included covariate)^{15} gives a helpful guideline on when use of the propensity score should effectively improve estimation.

4. Propensity score by treatment interactions

Consideration of the propensity score focuses on the real possibility that the effectiveness of a drug may vary according to the strength of the indication for its use. Among patients with weak indication for use, or among those with contraindications for use, a drug may provide no benefit or even be harmful, while in patients with clear indications for use, the drug may provide substantial benefits. These clinically relevant concerns are frequently overlooked in analyses of pharmacoepidemiologic studies, but the propensity score provides a natural perspective to elucidate them.

The example of Kurth et al^{20} illustrates the relevance of this perspective for pharmacoepidemiology. They studied the effect of treatment with tissue plasminogen activator (t-PA) on in-hospital mortality among 6,269 ischemic stroke patients in Westphalia. Their population included some treated patients with low propensity to receive treatment and small numbers of untreated patients with a high propensity to receive t-PA (). Stratified analysis by levels of the propensity score revealed heterogeneity in efficacy perhaps due to side effects of treatment. Treated patients with low propensity to receive t-PA had substantially elevated death rates relative to untreated patients. However, among those with propensity to receive t-PA above 5%, the relative odds of death in treated versus untreated patients was approximately 1. It is unclear how this anticipated interaction would be identified outside the framework of the propensity score, if it arises from a combination of factors.

5. Propensity score calibration to correct for measurement error

In almost all pharmacoepidemiologic studies, some covariates are either not measured or measured with error. Neither standard applications of propensity scores nor use of regression models may adequately adjust for such unmeasured or mis-measured covariates. However, it may be possible to obtain a more reliable estimate of the propensity score in a sub-study with more detailed covariate information and then use this gold-standard propensity score to correct the main-study effect of the drug on the outcome^{21}. One can view this approach as an application of regression calibration to correct for the measurement error in the main study propensity score that is available for all study subjects^{22}. Use of propensity score calibration allows one to account for multiple unobserved confounders that may have available information only for a subgroup of study subjects.

To illustrate the method, consider a study of the relationship of use of non-steroidal anti-inflammatory drugs (NSAIDs) with 1-year mortality in a large cohort of older people^{21}. The main study follows 103,133 residents of New Jersey age 65 or older for 1 year. As is common in data base studies, one has information on drug use, mortality and many determinants that allow for estimation of the propensity to NSAID use. However, other potentially important determinants of NSAID use, including cigarette smoking, non-prescription aspirin use, body mass index and education, may be available in a smaller, separate study such as the Medicare Current Beneficiary Survey (MCBS). The available data elements from these sources are illustrated in . One can estimate both the error prone and the gold standard propensity score in the validation sample that also contains information on NSAID use but is too small for reliable evaluation or lacks information on the outcome of 1-year mortality.

Analyses based only on the main study data found a significant 20% reduction in mortality among NSAID users in a multivariable regression model (RR 0.80; 95% CI: 0.77–0.83) that was virtually unchanged upon control for the error-prone propensity score available in the main study (RR 0.81; 95% CI: 0.78–0.84). A similar protective effect of NSAIDs on mortality was observed in a prior observational study and could not be explained by available measures of confounding variables^{23}. Application of propensity score calibration, based on the relationship of the gold-standard propensity score with the error prone propensity score and actual NSAID use in the validation study, resulted in a more plausible RR of 1.06 (95% CI: 1.00–1.12).

Propensity score calibration illustrates the magnitude of the bias that can arise from uncontrolled confounding. Propensity score calibration relies on the often unverifiable assumption inherent in corrections based on regression calibration that the error-prone propensity score is independent of the outcome given the gold-standard propensity score. If this assumption does not hold, propensity score calibration can increase bias in some scenarios^{21}. The approach may perform better with internal validation studies where detailed information on confounders is available for a sample of the subjects included in the main study.