|Home | About | Journals | Submit | Contact Us | Français|
Usefulness of propensity scores and regression models to balance potential confounders at treatment initiation may be limited for newly introduced therapies with evolving use patterns.
To consider settings in which the disease risk score has theoretical advantages as a balancing score in comparative effectiveness research, because of stability of disease risk and the availability of ample historical data on outcomes in people treated before introduction of the new therapy.
We review the indications for and balancing properties of disease risk scores in the setting of evolving therapies, and discuss alternative approaches for estimation. We illustrate development of a disease risk score in the context of the introduction of atorvastatin and the use of high-dose statin therapy beginning in 1997, based on data from 5,668 older survivors of myocardial infarction who filled a statin prescription within 30 days after discharge from 1995 until 2004. Theoretical considerations suggested development of a disease risk score among non-users of atorvastatin and high-dose statins during the period 1995–1997.
Observed risk of events increased from 11% to 35% across quintiles of the disease risk score which had a C-statistic of 0.71. The score allowed control of many potential confounders even during early follow-up with few study endpoints.
Balancing on a disease risk score offers an attractive alternative to a propensity score in some settings such as newly marketed drugs and provides an important axis for evaluation of potential effect modification. Joint consideration of propensity and disease risk scores may be valuable.
Special challenges apply to the control of confounding in studies of the safety and effectiveness of new and evolving therapies. Many covariates can influence choices among alternative therapies, and prescriber preferences often evolve quickly during the period of early experience with a specific drug or dose.1 Especially in early follow-up, there are typically relatively few study outcomes.
This setting of evolving prescriber preferences and relatively few outcomes can limit the use of both traditional multivariable models and propensity scores as approaches to obtain unbiased estimates of relative treatment effects. Whether one uses a case-control or prospective study design to compare outcomes across treatment groups, the number of potential confounders included in a standard regression approach is limited by the number of study endpoints. For example, reliable estimation in both logistic regression and proportional hazards models requires no more than one covariate (counting separately interaction and higher-order terms) for every 10 study outcomes.2 This can lead investigators to prioritize potential confounders and exclude some of theoretical relevance from multivariable analyses, leading to suboptimal confounder control.
Propensity scores are a valuable strategy to reduce the dimension of potential confounding variables and can be particularly useful when there are relatively few study endpoints.3,4 However, with new and evolving therapies, a prescriber’s preference regarding the characteristics of patients who indicate a specific drug choice is likely to change, and at varying rates across providers. Patients may also have varying attitudes about use of new therapies. Such evolving relationships of specific characteristics with treatment choices imply absence of a sharply defined propensity score, and can lead investigators to consideration of time-varying propensity scores.5 Additionally, if some variables are related to treatment choice but not study outcomes, these instruments are best not included in the propensity score,6–8 but their identification in the setting of newly evolving therapies is challenging.
With these challenges to multivariable analysis and propensity score estimation, a disease risk score can be a useful tool for confounder control. Here we provide some background on the use of the disease risk score in epidemiology, consider controversies regarding its estimation, note the balancing properties of this score, and illustrate the development of a disease risk score with examples from the use of statins, including high-intensity statin regimens, after myocardial infarction.
Whereas a propensity score summarizes the way potential risk factors differ between users of alternative treatments to be compared, a disease risk score characterizes the relationship of risk factors with the study outcome. Summary measures of disease risk play a prominent role in guidelines for drug use and the evaluation of possible effect measure modification of new treatments or new indications for established therapies. For example, 5-year risk of invasive breast cancer estimated from the Gail model critically influences use of selective estrogen receptor modulators for risk reduction.9,10 Similarly, guidelines for the use of statins in the primary prevention of cardiovascular disease incorporate the Framingham risk score as a key determinant of treatment eligibility.11 Other summary measures of disease severity that often direct treatments include the APACHE II score in intensive care patients,12 the NIH stroke scale,13 and the Glasgow coma scale.14 Strengths of these scales include their applicability in different populations and time periods.
When summary evidence suggests the value of a treatment in a target population, a disease risk score provides an important axis for evaluation of possibly varying effects, and for characterization of subgroup-specific absolute treatment effects. For example, in consideration of the use of statins for primary prevention, treatment guidelines require specific information on risks and benefits within categories of absolute disease risk.15
Although disease risk scores with pre-specified weights are a useful tool for confounder adjustment in studies of treatment effectiveness and safety, they are seldom sufficient to completely control for potential confounding. In the use of administrative data, the predictive ability of available comorbidity scores such as those developed by Charlson et al.16 and Elixhauser et al.17 can be enhanced through estimation of weights for their components within the study population of interest.18,19 However, these risk scores are generally considered to be only one component of a strategy to control confounding, rather than a self-sufficient approach. Even when total mortality is the study endpoint, administrative datasets generally contain additional determinants of death that are not included in available scores. A wider view of such potential determinants is generally required for adequate confounder adjustment, compared with the perspective provided by construction of a parsimonious comorbidity index. As in the construction of a propensity score,20 the disease risk score should err on the side of inclusion of variables that show even a modest association with the outcome.
The notion that a study-specific disease risk score alone can control confounding and aid in causal inference has a substantial history. Peters21 and Belson22 proposed a two-step approach for confounder adjustment with the first-stage development of a model to predict the outcome among the unexposed, followed by adjustment for the predicted outcome in a comparison between the exposed and unexposed. Cochran23 described the conditions under which this Peters-Belson approach is preferable to multivariable adjustment for causal inference. In particular, this approach has theoretical advantages in the presence of effect-measure modification across the dimension of outcome risk in the unexposed, and has extensive applications in economics and health-services research.24
In considering the value of alternative summary confounder scores to reveal potential problems with a multivariable analysis of the effects of an exposure, Miettinen recommended the use of a form of disease risk score.25 Specifically, in the setting of a case-control study, he recommended inclusion of the exposure status and all potential confounding variables in a multivariate model to predict the study outcome. Then each subject’s predicted risk was obtained by setting the exposure status to zero, and stratified analysis was used to evaluate the relationship of the exposure and outcome.
An important theoretical development in understanding the disease risk score is an appreciation of its balancing properties as described by Hansen, which parallels the balancing property of the propensity score.26 Specifically, with a properly developed propensity score PS(X) to summarize the way a vector of covariates X predicts treatment assignment, Rosenbaum and Rubin showed that if sufficiently large groups of exposed and unexposed subjects with the same value of PS(X) are identified, these two groups will have the same distributions of all components of X.27,28 This implies that stratification or matching on the propensity score can yield a better exposed/unexposed balance of these measured covariates than would be obtained by randomized treatment assignment.29
In parallel, a well-formed disease risk score DR(X) has the property that the potential outcome if untreated is independent of covariates X, given DR(X). Note that this is a balance of disease risks, as distinct from the balance of treatment propensities provided by the propensity score. This prognostic balance can only be evaluated in the untreated. Further, as Hansen points out, its evaluation in the untreated subjects within a population including treated and untreated subjects requires an assumption: that the potential outcome if untreated is independent of the actual treatment assignment given X.26 This is an assumption of no unmeasured confounders outside X. Table 1 summarizes the aspects of study design that can influence the relative utility of disease risk scores and propensity scores.
The above discussion indicates use of three distinct populations to develop a disease risk score: 1. in an alternative data set or in a time period prior to the current study, perhaps before introduction of a new therapy; 2, in the study population, but based on estimation of disease risk in the unexposed group only, akin to the Peters-Belson method; 3, in the entire study population, based on a model including indicators of exposure status, and then set this exposure status to zero for an individual’s predicted risk, as suggested by Miettinen. Each approach seeks to estimate a disease risk score that will be the most representative of the study population, and each has both strengths and limitations.
Estimation of a disease risk score using all subjects in the study population, based on a model with an indicator for exposure status, benefits from the ready availability of the data set and its use of a larger sample size than estimation restricted to the unexposed, to yield potentially more reliable estimates of disease risk under the assumption of a correct model form. Several simulation studies have found that stratification on a disease risk score obtained in this way (according to the suggestion of Miettinen) performs comparably to both propensity score stratification and multivariable analysis, as long as covariates are not too highly correlated with exposure.30–33 Further within the context of the scenarios examined, this full-cohort disease risk score can sometimes outperform a disease risk score estimated in the unexposed subjects only.33 However, as pointed out by Hansen, the validity of the disease risk score estimated in this way is sensitive to model form, especially the assumption of a uniform treatment effect across categories of disease risk. Even modest treatment effect heterogeneity can induce bias in the overall treatment effect with this approach. If the treatment groups differ substantially on important covariates (which is akin to a clear distinction of treatment groups by means of a propensity score), these concerns are enhanced. Further, inclusion of the exposure effect in the estimation of the disease risk score limits its value as a balancing score, as also discussed by Hansen.
Estimation of the disease risk score among unexposed subjects in the study population is also readily implementable, makes fewer assumptions than standard approaches that include exposed subjects, and yields a balancing score with desirable theoretical properties. However, reliable estimation of the model is a particular challenge in settings with relatively few outcomes, and these are expected among the unexposed subjects in the early monitoring period for a new therapy. Further, if the disease risk score is used to form strata for estimation of treatment effects within levels of disease risk, the over-fitting of the model under this approach tends to over-estimate treatment benefits in the high-risk group and under-estimate treatment harms in the low-risk group, which substantially limits the value of the score as an axis upon which to evaluate potential effect measure modification.
Further, if the estimated disease risk score is strongly correlated with exposure status, the biases found by Pike and colleagues34 to be associated with stratification by the disease risk score estimated by the approach of Miettinen also apply to disease risk scores estimated in the unexposed.26 These concerns have probably contributed to the relatively infrequent use of disease risk scores in pharmacoepidemiology.35 However, if exposed and unexposed subjects differ substantially on important determinants of disease risk, such that the shared support of risk factor distributions is limited, then valid comparison of treatments in an observational setting becomes less feasible,36,37 and stratification on either a risk or propensity score is a useful way to identify such non-overlap. Rather than a limitation, the ready ability to identify the kinds of subjects who almost always receive one specific treatment, and who thus should probably not be included in an analysis of comparative effectiveness, is a strength of both propensity score and disease risk score methods in pharmacoepidemiology.35,36,38
The disease risk score can also be estimated with data from a time period prior to the study period, or from a separate population. However, one difficulty with estimation in a separate population is that covariate assessments may differ from those in the study population. In the context of evaluation of a new therapy, the time period before its introduction in the target study population may be useful. The reasoning behind this approach is that in times of evolving therapies, the disease risk in the population may be more stable than the propensity score. We illustrate this approach in the examples that follow.
We use data on statin therapy in patients after myocardial infarction to illustrate the development of a disease risk score in the context of new and evolving therapies. Large-scale randomized trials conducted between 1994 and 1998 demonstrated the value of statin therapy after myocardial infarction for prevention of recurrent myocardial infarction, stroke, and cardiovascular death.39–41 Later trials showed that higher statin doses yielded greater risk reductions in this population.42,43 We consider use of a disease risk score to evaluate the relative effectiveness of atorvastatin (Lipitor, the first high-intensity statin marketed in the US), beginning with its first availability at the beginning of 1997; we also consider the efficacy of more intense statin therapy (defined according to the algorithm of Choudhry et al.44), which also was seldom used prior to this time.
We studied 5,668 enrollees aged 65–100 years in either New Jersey or Pennsylvania’s state-sponsored pharmacy assistance program who survived a myocardial infarction and filled a statin prescription within 30 days after discharge between January 1, 1995 and December 31, 2004.45,46 Figure 1 shows the strong time trend in the percent of such first post myocardial infarction prescriptions that were either atorvastatin or a high-dose statin. The efficacy endpoint was the composite including recurrent myocardial infarction, stroke, or death within 1 year after statin initiation. The analytic challenge was to develop an approach to control for multiple potential confounding variables that was applicable even during the early years of use of atorvastatin and high-dose statin therapy (i.e. 1997 and 1998) and consistent with confounder control in later experience. The study was approved by the Institutional Review Board of Partners Healthcare.
The principles discussed above suggested development of a disease risk score based on the unexposed individuals in the period just before exposure availability. Thus, we used data from individuals who used statins other than atorvastatin or high-dose statins after myocardial infarction in 1995 and 1996, but also included the individuals with exposure to these drugs following index myocardial infarction in 1997. We felt that the indications for exposure were still evolving and uncertain in that year, and the additional data would improve the reliability of the risk score. We used a logistic regression model to develop a disease risk score based on 826 patients who initiated statin therapy other than atorvastatin or high-dose statin, among whom 203 had recurrent myocardial infarction, stroke, or death within a year after statin initiation. Variables included in the disease risk score model were demographic characteristics, indicators of specific diseases encoded in medical encounters during the preceding year, summary measures of comorbidity (Charlson index and numbers of different generic drugs with prescription filled in the past year), and circumstances of the index hospitalization including angiography and duration of stay.
Table 2 shows the 20 variables included in the disease risk score, their prevalence or median in the population used to develop the score, and their multivariable relationship with recurrent myocardial infarction, stroke, or death. Variables associated with increased risk of the composite outcome were older age, black race, history of heart failure, and higher Charlson score, whereas angiography during the index hospitalization and a diagnosis of hypertension were associated with reduced risk. Overall, the disease risk score model had a C-statistic of 0.71 to predict 1-year risk of recurrent myocardial infarction, stroke, or death from any cause.
Based on this model, the predicted disease risk in statin initiators from 1997 through 2005 had a wide range with mean predicted probability of 0.272 in atorvastatin initiators, 0.277 in initiators of other statins, 0.272 in initiators of high-dose statins, and 0.276 in initiators of lower-dose statins (Figures 2 and and3).3). Distributions of disease risk scores overlapped broadly and were similarly shaped across treatment groups. The slightly lower mean disease risk scores in the atorvastatin and high-dose statin groups reflected younger average ages, higher rates of angiography, and decreased prevalence of an index hospitalization lasting 10 or more days (Table 3).
Several approaches are possible in the use of disease risk scores for confounder control in comparative effectiveness research, including matching exposure groups on risk levels, stratified analysis, and multivariate adjustment. Table 4 shows the impact of adjustment for the estimated disease risk score in comparisons of atorvastatin versus other statin regimens, and high versus lower doses of statins. Adjustment for disease risk as a continuous variable led to slight changes of crude estimates of 7–8% reductions in the odds of recurrent myocardial infarction, stroke, or death associated with atorvastatin treatment or treatment with high-dose statins.
The disease risk score may have particular utility for the control of confounding in early follow-up after introduction of a new therapy. Parallel logistic regression analyses controlling for disease risk score and restricted to the 897 individuals who initiated statin therapy after myocardial infarction during the 2-year period 1997–1998 found that users of atorvastatin had somewhat lower risk relative to users of other statins (odds ratio 0.71; 95% CI: 0.5–1.0) and that users of high-dose statins had reduced risk relative to users of lower-dose statin therapy (odds ratio 0.57; 95% CI: 0.3–1.1).
Stratification on the disease risk score provides a straightforward approach to evaluate possible effect measure modification across levels of disease risk (Table 5). Observed risk of the composite outcome ranged across quintiles of the disease score from 12.6% to 31.4% in atorvastatin-treated patients, and from 11.8% to 32.4% in patients treated with high-dose statins. Generally, odds ratios associated with atorvastatin therapy as well as with high-dose statin therapy tended towards greater reductions in higher-risk individuals, although confidence intervals were wide and broadly overlapping.
We also considered development of propensity scores as a strategy to control confounding in the evaluation of relationships of atorvastatin and high-dose statin therapy with outcomes in these data. Challenges to the use of propensity scores in this setting included uncertainty about evolving prescriber preferences in the face of new evidence on benefits of these therapies during the study period, and small numbers of individuals initiating high-dose statin therapy during early follow-up (Figure 1), which limited ability to estimate time-period specific propensity scores with all covariates included. In particular, with only 68 initiators of high-dose statins during the years 1997–1998, a logistic model predicting this treatment and including all twenty covariates showed evidence of over-fitting, and had coefficients with large differences from the estimates obtained in a model based on data from 1999–2004. In light of the small number of initiators of high-dose statins during early follow-up, it was unclear whether apparent differences from propensity score estimates in later follow-up represented sampling variability or true changes in treatment preferences.
In comparative effectiveness and safety analyses with evolving therapies, the disease risk score may be a valuable tool to balance important covariates across treatment groups, identify types of subjects with non-overlap between treatment groups where valid comparisons of comparative effects are impossible, and to evaluate potential treatment effect measure modification.
Compared with propensity scores, disease risk scores are far less commonly used and have more theoretical shortcomings for comparative effectiveness research. In particular, while both approaches share the useful ability to reduce to one the dimension of potential confounders, balance with respect to the disease risk score can be evaluated only in the untreated, and estimation of this score within the study population is potentially problematic. Nonetheless, in the setting of early evaluation of evolving therapies, where reduction of the dimension of the confounder space is particularly desirable, no coherent propensity score may exist because of changing patient and provider preferences. The disease risk score is likely to be more stable over time, and the work required to estimate this score based on recent history in the health system under study may improve estimates of comparative effectiveness. Further, while the disease risk score is less useful in settings with rare outcomes where reliable multivariable risk prediction is problematic, the risk score approach has advantages in studies of multiple exposures such as our consideration of atorvastatin and high-dose statin therapy, where a single score is applicable to all exposure categories.47
Another advantage of the disease risk score is its utility as a scale for arguably the most important dimension in the evaluation of possible effect measure modification. Absolute disease risk plays a critical role in many treatment decisions. Stratification on a disease risk score provides a transparent approach to compare absolute and relative treatment effects on this important axis.
However, one need not choose between a disease risk score and propensity score approach to balance potential confounders. As an approach to match subjects in alternative treatment groups, one can minimize the distance in both the disease risk and propensity score dimensions; nor must one weight the distances in these two dimensions equally. With common exposures and fewer data on outcomes, one can emphasize the distance on the propensity score dimension via overweighting. Conversely, with new or rare treatments, one can emphasize the disease risk score dimension.
In summary, we believe the disease risk score is a useful tool with unique strengths for comparative safety and effectiveness research on new and evolving therapies. Evidence on comparative effectiveness of medications is particularly needed shortly after market approval, when insurance coverage decisions must be made. Products marketed with evidence of superior benefits or more favorable safety profiles, as compared to existing alternatives, will likely receive positive coverage conditions and therefore experience rapid uptake in the marketplace. Insurers seek timely comparative data to avoid fast and diffuse adoption of less effective or possibly harmful drugs; once prescribing patterns are established, they are difficult to change, even in the face of compelling comparative effectiveness evidence.
Initiatives such as the Sentinel System of the Food and Drug Administration reflect a heightened interest in early identification of adverse effects and benefits of drugs and new doses as soon after marketing as possible. Joint consideration of both propensity and disease risk scores for new therapeutics have the potential to improve estimates of comparative effectiveness.
Supported by grants AG018833 and AG023178
Disclosure: Dr. Glynn has received grants from AstraZeneca and Novartis for clinical trials monitoring and analysis and has given invited Grand Rounds at Merck. Dr. Schneeweiss is Principal Investigator of the Brigham and Women’s Hospital DEcIDE Center on Comparative Effectiveness Research and the DEcIDE Methods Center (both funded by AHRQ) and of the Harvard-Brigham Drug Safety and Risk Management Research Center (funded by the FDA). Dr. Schneeweiss is a paid member of the Scientific Advisory Board of HealthCore and consultant to WHISCON and Booz & Co.; in addition, he is the recipient of investigator-initiated grants from Pfizer and Novartis.