|Home | About | Journals | Submit | Contact Us | Français|
Little information is available comparing methods to adjust for confounding when considering multiple drug exposures. We compared three analytic strategies to control for confounding based on measured variables: conventional multivariable, exposure propensity score (EPS) and disease risk score (DRS).
Each method was applied to a dataset (2000–2006) recently used to examine the comparative effectiveness of four drugs. The relative effectiveness of risedronate, nasal calcitonin and raloxifene in preventing nonvertebral fracture, were each compared to alendronate. EPSs were derived both by using multinomial logistic regression (single model EPS) and by three separate logistic regression models (separate model EPS). DRSs were derived and event rates compared using Cox proportional hazard models. DRSs derived among the entire cohort (full cohort DRS) was compared to DRSs derived only among the referent alendronate (unexposed cohort DRS).
Less than 5 percent deviation from the base estimate (conventional multivariable) was observed applying single model EPS, separate model EPS or full cohort DRS. Applying the unexposed cohort DRS when background risk for fracture differed between comparison drug exposure cohorts resulted in −7% to +13% deviation from our base estimate.
With sufficient numbers of exposed and outcomes, either conventional multivariable, EPS or full cohort DRS may be used to adjust for confounding to compare the effects of multiple drug exposures. However, our data also suggest that unexposed cohort DRS may be problematic when background risks differ between referent and exposed groups. Further empirical and simulation studies will help to clarify the generalizability of our findings.
Results from comparative effectiveness studies facilitate decision making by physicians and their patients, and help payers optimize the value of healthcare expenditures.1 Pharmacoepidemiologists often use one or more of three types of multivariable statistical techniques to adjust for several potential confounders when comparing drug effects: 1) conventional adjustment by fitting multivariable regression models predicting the outcome of interest that include terms for the exposure variables of primary interest, and terms for each of the potential measured confounders (conventional multivariable), 2) adjustment using a confounder summary score created by fitting multivariable regression models predicting the exposures of interest that include terms for all potential measured confounders (exposure propensity score, EPS), and/or 3) adjustment using a confounder summary score created by fitting multivariable regression models predicting the disease outcome of interest that include terms for all potential measured confounders (disease risk score, DRS).2–6 These different analytic strategies have recently been summarized and contrasted in pharmacoepidemiologic analyses that compare the effects of two drug exposures, with generally little difference in results observed between methods applied.7, 8 Limited information is available that compare these methods when considering multiple drug exposures, as is the case in comparative effectiveness research. Multiple drug exposures increase the complexity of EPS estimation, with no effect on DRS estimation. Conclusions drawn in the setting of multiple drug exposures may thus be different than in previous assessments. We sought to compare the main three analytic strategies employed by pharmacoepidemiologists to control for potential confounding factors (conventional multivariable, EPS and DRS) in the context of multiple drug exposures. We illustrate the comparison by applying each method to a dataset recently used to examine the comparative effectiveness of four osteoporosis drugs in preventing nonvertebral fracture.9
We studied new users (no use in previous 365 days)10 of four agents approved to treat osteoporosis among enrollees in two state-wide (New Jersey and Pennsylvania) pharmaceutical benefits programs.9 We restricted inclusion to those with at least one pharmacy and Medicare claim in each of the 3 preceding 6-month periods. This was to ensure a minimum of 12 months of enrollment without any use of the study drugs prior to defining new use. We then restricted inclusion to the period of time when all agents of interest were available (starting April 1, 2000). At the time of analysis, we had complete data through December 31, 2003 for New Jersey, and through December 31, 2005 for Pennsylvania. The Partners HealthCare Institutional Review Board approved this project. Data Use Agreements are in place from PACE and the Centers for Medicare and Medicaid Services.
The exposures in this study were risedronate, nasal calcitonin, raloxifene and alendronate, the most widely prescribed osteoporosis treatment. Each exposure was compared to alendronate, which was considered the referent or “unexposed” in all analyses.
In this study, the outcome was the occurrence of first nonvertebral fracture (hip, humerus, radius or ulna) within 12 months of treatment initiation. Fractures were defined using previously validated criteria requiring diagnostic and procedural codes from Medicare claims.11 We divided numbers of fractures by person time to arrive at fracture incidence rates.
Covariates included factors plausibly related to risk for fracture and were identified by pharmacy and Medicare claims within 365 days prior to treatment initiation. We also included calendar time (month and year) of index prescription to adjust for potential secular trends in coding and prescribing. Covariate definitions and coding are available in the online supplement to the prior paper (www.annals.org).9
Our example dataset introduced subtle, but important analytic issues that need consideration. First, we studied new users from two states (two data sources), each with different time periods of observation based on available data, from April 1, 1999 through to December 31, 2003 in New Jersey and through to December 31, 2005 for Pennsylvania. The number of dummy variables to indicate the index date (month and year) of treatment initiation was therefore different between states. Second, we compared three distinct and divergent cohorts with the referent cohort of alendronate initiators: one cohort with similar indications and therefore background risk for fracture outcomes based on measured covariates (risedronate initiators), one cohort with greater background risk (nasal calcitonin initiators, e.g. calcitonin users were older and had a higher prevalence of fractures versus alendronate initiators), and one cohort with lower background risk (raloxifene initiators, e.g., raloxifene users were younger and had a lower prevalence of fracture versus alendronate initiators). Our example thus provides the opportunity to examine the implications of applying the different analytic strategies when background risk between cohorts of drug initiators is similar or different.
We used Cox proportional hazard models to compare nonvertebral fracture rates between agents by deriving relative rates (as hazard ratios, HRs). We stratified all Cox models by state (Pennsylvania and New Jersey), and where possible, used the same covariates in applying each method. Table 1 summarizes the number of covariates included in each adjusted model. In the current study, we compared adjustment for measured covariates using conventional multivariable analysis, with adjustment using two different methods of deriving EPS, and with adjustment using two different methods of deriving DRS, Table 2.
We fit a single multivariable Cox proportional hazards model stratified by state that included terms for the exposure variables of interest (3 dummy variables: risedronate, calcitonin, raloxifene) and a term for each covariate, for a total of 116 covariate terms.
We developed state-specific EPS in two ways:
We used one multinomial logistic regression model in each state to create 3 distinct EPSs; one for risedronate versus alendronate, one for calcitonin versus alendronate and one for raloxifene versus alendronate. We then adjusted for confounding by including EPS quintiles (12 EPS dummies; four for each of the three EPSs) in a single Cox proportional hazard model, stratified by state. We therefore used one multinomial logistic regression model per state to create all 3 state-specific EPSs, and then one single Cox proportional hazard model, stratified by state, to compare event rates between exposures.
We first defined three contrast cohorts within each state: 1) risedronate and alendronate initiators, 2) calcitonin and alendronate initiators, and 3) raloxifene and alendronate initiators. Second, we used logistic regression to create state-specific EPSs within each contrast cohort: risedronate versus alendronate, calcitonin versus alendronate and raloxifene versus alendronate (three separate logistic regression models per state). Third, we ran three separate Cox proportional hazard models stratified by state, one for each of the three contrast cohorts, and adjusted for respective EPS quintiles using 4 dummy variables. We therefore defined state-specific EPSs and compared event rates between dichotomous exposures within each of the three contrast cohorts. Each comparison was restricted to patients with overlapping propensity scores, defined by the common range of EPSs between the highest EPS value among alendronate users (referent, or “unexposed”), and the lowest EPS among the exposed (risedronate, calcitonin or raloxifene).
We developed two separate state-specific DRSs using nonvertebral fracture (hip, humerus, radius or ulna) within 12 months as the outcome:
State-specific DRSs were derived using the entire population by including all potential confounders and 3 dummy terms for exposure that identified risedronate, calcitonin or raloxifene initiators, into a single Cox proportional hazard model. We then calculated a DRS for each person by multiplying the regression coefficients from each state-specific DRS model equation times the individual covariate values for each person, but excluded exposure terms from the calculation; thereby setting exposure terms to zero for each person, even if actually exposed to risedronate, calcitonin or raloxifene. In summary, we derived and calculated state-specific DRSs using the full (entire) cohort (exposed and unexposed) within each state.
To evaluate how well fracture rates increased across DRSs, we plotted event rates (nonvertebral fractures within 12-months) for each drug exposure by full cohort DRS quintile.
We repeated the above process, but restricted derivation of the state-specific DRS equations to the referent cohort (alendronate), hereafter referred to as the “unexposed” cohort. We then used regression coefficients from this model to calculate individual DRSs for each person in the full cohort. DRS comparison models were restricted to patients with overlapping DRS.
To evaluate how well fracture rates increased across DRSs, we plotted event rates for each drug exposure by unexposed cohort DRS quintile.
To examine the performance of each analytic strategy, we calculated the amount of difference observed from a base estimate. Pragmatically, the base estimate was defined using conventional multivariable regression. We also compared our published primary result that controlled for confounding using the single model EPS (12 EPS dummies) as well as age groups, fracture history, race and diagnosis of osteoporosis to the base estimate.9 The amount of deviation observed from the base estimate was calculated as the difference between the observed hazard ratio and the base estimate hazard ratio, divided by the base estimate hazard ratio, multiplied by 100.7, 12–14
Overall, interpretation of results was similar based on each analytic strategy (Table 3). In each case, we observed no difference in nonvertebral fractures rates between risedronate or raloxifene and alendronate, and higher event rates among calcitonin compared with alendronate initiators. The magnitude of the hazard ratio varied little by method when comparing risedronate to alendronate (from 0.98 to 1.01), <3% deviation. Hazard ratios also varied little when comparing results using EPS methods with the conventional multivariable model, <4% deviation. However, greater than 5% deviation was introduced from our base estimate when applying DRSs methods of adjustment. The deviation was particularly large using the unexposed DRS method, “overestimating” the ratio of fracture rates between calcitonin and alendronate (hazard ratio of 1.56 versus a base estimate of 1.38, 13% deviation), and “underestimating” the ratio of fracture rates between raloxifene and alendronate (hazard ratio of 1.08 versus a base estimate of 1.17, −7% deviation). Few patients were excluded due to non-overlap of confounder summary scores: <1% using DRSs and <4% using EPSs. Results were similar when DRSs and separate model EPS analyses included “non-overlap” (data not shown).
Fracture rates increased 10-fold by full cohort DRS quintile, from 0.6 per 100 person-years in the lowest full cohort DRS strata, to 6 per 100 person years in the highest full cohort DRS strata (Figure 1). Although fracture rates also increased from lowest to highest unexposed cohort DRS quintile, rates did not increase monotonically by quintile for each exposure. In fact, fracture rates were lower in the second strata than in the first unexposed DRS strata among calcitonin and raloxifene recipients.
There is a growing interest in studying the comparative effectiveness and safety of drugs used to treat the same condition.1 We compared different methods to adjust for potential confounding using measured variables when studying multiple drug exposures. Consistent with prior studies that considered different analytic strategies to compare effects of dichotomous exposures,8, 15, 16 we found that adjusting for confounding using exposure propensity scores (EPS) results in similar estimates to those obtained from adjustments based on conventional multivariable regression. As highlighted previously, one advantage of EPS over conventional multivariable regression is in reducing the number of covariate terms in the outcome model.17, 18 Thus, EPS may be a particularly relevant method when exposures are prevalent but disease is rare.8, 19
We also documented comparable results using single model EPS (multinomial logistic regression) versus separate model EPS (derived using separate logistic regression models). This is consistent with the prior finding that a series of separate simple logistic regression analyses may be used to replace polychotomous logistic regression because the asymptotic relative efficiencies of predicted probabilities are generally high.20, 21 The potential advantages of using single model EPS over separate model EPS, or vice versa, may be pragmatic. Creating EPSs using a single multinomial logistic regression and then adjusting for EPSs in a single Cox model may seem simpler than running three separate logistic models and three separate Cox models. However, multinomial logistic regression requires more computational power, particularly if controlling for numerous covariates. The advantage of running fewer models in applying single model EPS versus separate model EPS also comes at the expense of complexity in trying to restrict analyses to EPS overlap. In our study, few subjects were excluded due to non-overlap and thus it is not surprising that results were similar when non-overlap patients were included in single model EPS adjustment. Restricting analyses to EPS overlap in other settings, however, has been shown to be critical.19, 22 For example, when studying the effects of biologics among rheumatoid arthritis patients, a few untreated subjects can become influential.22 Future work to help better understand the intricacies of restricting a single Cox regression model to overlap across two or more EPSs (high-dimensional overlap), is needed. The less complex (separate model EPSs) method that is as valid as the more complex multinomial model EPS, may generally be preferred because it is simpler to describe, evaluate, and restrict analyses to EPS overlap.
We found similar results across each method of adjustment (conventional, EPS, DRS) when the background risk for disease was also similar (risedronate and alendronate). However, adjusting for potential confounding using unexposed cohort DRS when drug exposures had different background fracture risk (equation derived only among “unexposed,” i.e., alendronate), introduced from −7% to 13% deviation in estimates. The problem is likely due to model extrapolation. When estimating the DRS in alendronate users, the range of disease risk predicted in the model is restricted to that observed among alendronate users. If the risk is higher or lower in any of the exposed cohorts (as it is here), the model will be extrapolated by definition, and thus be more prone to bias due to model misspecification. In our example dataset, calcitonin initiators had a higher background risk and raloxifene initiators a lower background risk for fracture compared to our referent alendronate initiators. We then observed deviation away from the null among those with higher background risk (calcitonin initiators were “sicker” than referent), and deviation toward the null among those with a lower background risk (raloxifene initators were “healthier” than referent). Applying unexposed cohort DRS when background risks differ from the referent attributes these differences in background risk to drug effects.
The potential for bias is problematic when a risk model is derived in a cohort whose disease risk differs from those to whom the risk model is extrapolated. This is not limited to studies that examine the comparative effectiveness of drugs, and may be even more problematic when studying drug effects compared to non-users (true unexposed), since disease risk among patients who were not prescribed the drug under study may be substantially different from the risk among those who are not treated, i.e., confounding by indication. The problem of generalizability of risk score models beyond the original cohort used to derive the model is not unique to pharmacoepidemiology. The importance of validating disease risk scores is standard in the application of prediction rules.23 Nonetheless, use of unexposed DRS may be advantageous over EPS when studying multi-level exposures with few observations at one or more levels, and may be advantageous over full cohort DRS in settings with true unexposed individuals, which otherwise may require strong assumptions about effect modification.24 Further studies that explore the benefits and limitations of unexposed DRS are thus warranted. Further empirical and simulation studies will help to clarify the generalizability of our findings.
Our study is limited by not having a gold standard estimate of the true difference in fracture risk between agents. We therefore used conventional multivariable regression and also report deviations from this estimate rather than bias. Regardless, defining any one of conventional or EPS methods as the base estimate would not change our finding that the unexposed DRS method results in different estimates when comparing agents among recipients with divergent risk profiles. Nonetheless, we report results from a single dataset and thus further study is important to examine the generalizability of our findings. Simulation methods may help to clarify scenarios where unexposed cohort DRS could be particularly useful versus if and when caution in use is warranted.
We focused our study on comparing analytic strategies to control for multiple measured confounding factors. In addition to methods described here, other methods to control for measured confounding in cohort studies may include stratification, subgroup analyses and matching.3, 25 Increasing restriction of study cohorts has recently been found to yield results closer to those seen in randomized controlled trials.26 Although appealing for sensitivity analysis, these methods are limited to large subgroups, and limit the generalizability of study findings. In our original paper,9 fracture risk was stratified into four groups based crudely on the two main risk factors measured in the dataset, age (median) and fracture history. Stratification based on full cohort DRS quintiles in the current study permits finer assessment of effect modification across all measured variables. A distinct advantage of full cohort DRS is therefore the ability to easily stratify results by overall disease risk based on the full set of potential confounding factors. Although DRSs values do not represent actual probability of disease and are thus not bound by 1, this does not affect their performance to control for confounding or demonstrate increasing risk across quintiles. This is because rankings of risk is preserved using DRSs derived from Cox proportional hazard models.8 Nonetheless, clinicians may still prefer stratification by individual covariates, as it is more transparent and thus easier to translate into practice.
The decision for which method to use for confounder adjustment when comparing multiple drug exposure depends on several factors. First, one must consider the design of the study. Each of the methods is applicable in the context of a cohort study. Conventional multivariate methods are compatible with case-control studies. However, simulation studies identify the potential for artifactual effect modification of odds ratios and reduced ability to control for measured confounding factors when applying EPS in case-control setting.27 The performance of the DRS in the case-control setting has not yet been assessed; since it can be viewed as a balancing score,24 it may have similar limitations as the EPS in case-control studies. Second, the relative number of exposures, covariates, and outcome events must be considered. When few covariates are to be adjusted relative to the number of outcome events, conventional methods are appropriate. With a large number of covariates and individuals in each exposure category and few outcome events, EPS may be most appropriate. With a large number of covariates and enough outcome events for stable DRS estimation, the full cohort DRS may be useful, particularly when one or more exposure categories are rare or when one wishes to examine effect modification by baseline risk. In situations where exposures are rare, the unexposed DRS method may be appropriate.
In summary, our data suggest that when there are sufficient numbers of exposed and sufficient numbers of outcomes, pharmacoepidemiologists may use either conventional methods, EPS or full cohort DRS (derived using the entire cohort) to adjust for potential confounding to compare the effects of multiple or binary drug exposures. Although our results based on a single study suggest that using the referent or unexposed DRS cohort method may introduce bias related to model extrapolation, further research is needed to clarify under which circumstances unexposed DRS may be advantageous and when caution is warranted. Until simulation studies are completed to clarify the benefits and limitations of unexposed DRS method, caution in use of unexposed cohort DRS methods may be warranted when there is evidence of drug channeling based on risk for the outcome.
Grant Support: This study was funded, in part, by a University of Toronto Connaught Start-Up Award. Dr Cadarette was supported by a Canadian Institutes of Health Research (CIHR) Post Doctoral Fellowship and is supported by CIHR and Osteoporosis Canada through a CIHR New Investigator Award in the Area of Aging and Osteoporosis; Dr Katz is supported by the National Institute of Arthritis and Muskuloskeletal and Skin Diseases (K24 AR02123 and P60 AR47782); Dr Solomon is supported by the Arthritis Foundation and the National Institutes of Health (K24 AR055989, R21 AG027066 and P60 AR47782); and Dr. Stürmer is supported by the National Institute on Aging (RO1 AG023178), and by the UNC-GSK Center for Excellence in Pharmacoepidemiology and Public Health.
Authors kindly acknowledge helpful discussions with Drs. Jerry Avorn, M. Alan Brookhart, Robert J. Glynn, and Kenneth J. Rothman, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School.