In this analysis of data from an observational, retrospective cohort study of initial maintenance therapies for COPD, we demonstrated the similarity of results using two analytic approaches to observational research. Specifically, we compared results from a PSM analysis with those from a previously published, parallel MR analysis.13
We found that both methods yielded similar health care utilization and cost outcomes. General agreement between MR and PSM methods has been found in other studies. In a review of 177 comparative method studies, Sturmer concluded that substantial changes in treatment effects were seen when point estimates were calculated with and without adjustment for covariates, but that the method of adjustment itself – MR or PSM – made little difference.1
In PSM, a high degree of propensity score overlap after matching is desirable in terms of internal validity. When overlap is minimal, unmeasured confounding bias in treatment groups probably cannot be resolved using either MR or PSM techniques.1
In the present PSM analysis, large proportions of patients in both the TIO and IPR cohorts (89.1% and 80.2%, respectively) were matched to FSC patients, and there was substantial overlap in propensity scores between groups. In other words, matching produced cohorts with similar baseline characteristics. In general, the few statistically significant differences remaining after matching were minor in terms of effect size and practical significance, and had small absolute standardized differences. Characteristics that would be expected to considerably skew utilization outcomes, such as comorbid cardiovascular disease, were not different between matched groups.
Outcomes differed slightly between the PSM and MR analyses. Some differences may be due to the smaller PSM sample size, since the excluded patients were a contributing explanatory factor for the lower PSM utilization and cost estimates. While the MR analysis was a population-based study, the PSM analysis, as a result of the matching process, excluded some younger individuals, with minimal comorbidities, who were treated with FSC, and some older, sicker individuals who were treated with IPR or TIO. Exclusion of the older and sicker individuals resulted in lower mean costs for IPR and TIO patients, while costs for FSC patients were quite similar in both analyses.
Event frequency also may be a factor in the differences in findings between analysis methods. Multivariable logistic regression and propensity matching have been found to produce similar results when events are not infrequent.23
Through simulations, Cepeda and colleagues found that the use of propensity scores yielded less biased estimates than multivariable logistic regression only when there were eight or fewer modeled events per covariate.26
When the ratio of modeled events was higher, multivariable logistic regression was the better method. Other studies have determined that ten events per covariate is a desired ratio when using maximum likelihood methods.27
The main outcomes in our analyses, (COPD-related outpatient visits associated with an antibiotic or oral corticosteroid, ED visits, and hospitalizations), although of great concern clinically, occur relatively infrequently, from a statistical standpoint, when averaged across a large population of COPD patients that is unrestricted in terms of disease severity. Our original MR analysis included 44 covariates. The outcome of outpatient visit with an oral corticosteroid had the smallest number of events per covariate modeled, with 877 of 32,338 patients having at least one event, which translates to 20 events per covariate in the multivariate logistic regression analysis. This compares to 65 events per covariate for the combined endpoint of hospitalization/ED visit. The lower ratios of events per covariate for some outcomes may have been a factor in the different findings of the two analyses for ORs and IRRs. On the other hand, costs (COPD-related medical service costs and pharmacy costs) were universally incurred, and both analyses found that, compared to FSC, TIO and IPR were associated with higher costs for COPD-related medical services, and higher total costs, even though costs associated with TIO and IPR were reduced in the PSM analyses.
Both MR and PSM methods adjust associations between treatment effects and outcomes to reduce potential bias from observed covariates. Other researchers have reported that results from the two methods appear to be consistent when there is large overlap between groups in propensity for a given treatment, which ensures minimal loss of observations, and when outcomes can be modeled with a relatively large number of events per covariate.1
Our findings support this view and suggest that, with regard to less frequent events, in particular when effect sizes may be small, consideration should be given to analyzing outcomes using both methods, assuming a large proportion of subjects can be matched.
While PSM is a more transparent method, in the sense that it allows one to see the degree of equality between groups after matching, in this study, PSM provided little advantage over MR in terms of the validity of the results. Because of the inevitable reduction in sample size and change in overall composition of treatment groups being compared, the choice of whether to use PSM or MR will depend on the question being investigated, whether a population effect is being measured, and whether review of a non-representative population of patients receiving treatment is acceptable (or even preferred). This point is addressed by D’Agostino, who recommended that when patients are excluded in matched analyses, researchers need to be particularly clear in their descriptions of the included and excluded patients, and of the populations to which study results are applicable.28
Strengths of this study include the large sample sizes of both analyses and the high degree of propensity matching, with approximately 80% and 89% of the original IPR and TIO cohorts matched, respectively, to FSC patients. The exclusion of some original subjects from the PSM cohorts due to lack of a match does mean, however, that any additional information these subjects might have provided was lost, and statistical power affected. Some limitations should be considered in interpreting the results. We measured exacerbations using claims data, defining exacerbations as COPD-related health care events. Using an alternative definition of exacerbation based on symptoms, lung function, or other clinical parameters could influence observed effect sizes.14
However, we would not expect a different definition of exacerbations to influence effect sizes differently for MR than for PSM, or for it to change the overall findings of this study. Since both MR and propensity matched analyses attempt to reduce bias through adjustment using covariates, the ability to do this is dependent on the capture of all relevant factors. In this analysis, some potential confounders were unmeasured. As this was an observational study utilizing administrative claims data, information about patients’ clinical status was not available. We could not ascertain lung function status, disease severity, or other clinical characteristics. Therefore, residual baseline differences between treatment groups may remain. However, we did control for two key characteristics of interest – disease severity and exacerbation frequency – by using prior COPD-related health care and pharmacy utilization (particularly oral corticosteroids/antibiotics) as proxy measures.