We have combined for the first time two existing epidemiologic methods, propensity scores and regression calibration, to control for unmeasured confounding in cohort studies using validation data. By taking the joint distribution of unmeasured confounders into account, the PSC extends prior work using sensitivity analyses and adjustments based on single confounders (1
). In our example, this novel approach appeared effective in controlling for unmeasured confounding by virtually eliminating the presumably spurious ‘protective’ effect of NSAID on mortality. In addition, the amount of residual confounding due to unmeasured and poorly measured covariates was important enough to qualitatively change the likely spurious association between NSAID use and all-cause mortality. The proposed strategy is easy to conceptualize and implement using user-friendly, well-documented, and available software.
PSC needs to be compared to other methods to address unobserved confounding in non-experimental research (1
). Others have proposed sensitivity analyses addressing the effect of several confounders by using external information on the relation of unobserved confounders with the exposure and disease of interest (8
). To our knowledge, none of these methods can sufficiently address the joint effect of several unobserved confounders. Another important advantage of PSC is that it is not dependent on outcome information in the validation study, i.e. it can be applied using cross-sectional external validation studies without having to specify the confounder-disease association. In this setting, imputation of gold-standard PS values in the main study from the error-prone PS, exposure and the regression parameters from the error-model in the validation study would essentially be a form of RC. It has recently been shown that these are algebraically identical (26
), but estimation of adjusted standard errors taking the uncertainty in the error-regression model into account would need to rely on bootstrapping or derivation and programming of asymptotic variances (14
In the setting of an internal validation study, i.e. when information on an extended set of covariates and the outcome is available for a subsample of the main study, unobserved confounding in the main study can be seen as a missing covariate problem. In this context, methods to address missing data, including multiple imputation (28
) and Bayesian methods (29
) may be applied.
We present a single example to illustrate the applicability of the analytic strategy and to encourage further research into its validity and limitations in different settings. Many of the issues regarding measurement error and validation samples have been addressed previously, and are likely to apply to PSC as well. Accordingly, the proposed method is approximately unbiased with the amount of residual bias depending on established theoretical and analytical properties of propensity scores and regression calibration. The combination of these methods could, however, lead to analysis of parameter constellations not previously assessed (17
). Validation studies often are not a perfect random sample of the main study population, but as long as the sampling is representative for the error mechanism, valid estimation and inference will follow using RC even if the marginal distributions of variables in the main study do not match those of the validation study (32
). RC approximations break down when the measurement error is large, i.e. when the correlation between the estimated error-prone and ‘gold-standard’ PS is weak. In our example, the correlations with the error-prone PS were moderate (0.56 – 0.68). The sizes of the main and validation studies will have a big impact on the precision of the resulting estimates (33
), and both were very large in our example. Because the validation study was very large in our example, there was only a minor increase in the width of the confidence interval for the exposure of interest using PSC.
Our approach is easy to implement (see appendix 1
). PS are increasingly used in medical research (35
) although many issues concerning their optimal implementation are still unresolved. RC is the most widely applied approach to address measurement error in multivariate epidemiologic analyses. Its advantages include the ability to adjust for confounding in the primary regression by continuous, discrete, and ordinal variables, which are assumed to be perfectly measured, as well as the availability of an easy to implement (see appendix 1
) macro providing adjusted effect estimates and standard errors for a variety of multivariate models.
We chose the example of NSAID and all-cause mortality, since NSAID use is unlikely to lead to a reduced mortality in this elderly population (36
). Glynn et al. argued that in an elderly population selected drug classes, including NSAID, are more likely to be prescribed to healthier subjects less close to death (36
). NSAID use in the elderly is associated with several adverse outcomes, including increased risk of gastrointestinal hemorrhage (37
), impaired kidney function (41
), and hypertension (44
). Therefore, no association with mortality or a slightly increased risk of mortality seems biologically more plausible than any amount of reduced risk.
The design of validation studies has been extensively addressed (46
) and these issues are likely to apply to PSC as well. It is important to note that medication use was assessed by prescription claims data in the main study and by interview in the validation study. Ideally, we should have medication use assessed by prescription data in the validation study as well in order to exactly reproduce the error-prone PS in the validation study. Depending on the relative quality of these sources of information on medication use (51
), the error in the error-prone PS might have been under- or overestimated somewhat in our example. The observed differences in the prevalence of medication use per se do not invalidate the approach (34
We used the PS as a continuous variable in our disease and error models, which is dependent on the assumption of a log-linear association with the dependent variable. This assumption can be easily tested (for example by inclusion of categories or quadratic and cubic terms) in the measurement error model (equation 5
) and was essentially met in our study. Like regression calibration, PSC is also dependent on the assumption that the error-prone variable is a surrogate for the gold-standard variable, i.e. that the error-prone propensity score is independent of disease given the gold-standard propensity score and exposure (27,52, see appendix 2
). This strong assumption cannot be tested without information on the outcome, e.g. in an external validation study. If this assumption is violated, PSC can be far less useful and even counter-productive, in that it can increase rather than decrease bias. If information on the outcome is available in the validation study (e.g. in an internal validation study), one can check the independence of the error-prone propensity score and disease given the gold-standard propensity score and exposure. If this is the case, the assumption may be considered valid in the context of the main study, too.
The assumption that the gold-standard propensity score has a linear association with the logit of disease given exposure (9
) cannot be tested for the target disease model (equation 6
). Since the gold-standard propensity score captures all of the confounding in a single covariate, misspecification of its association with the outcome is likely to have pronounced effects on its ability to control for confounding (53
We did not assess the role of variable selection procedures when estimating the PS. With large datasets it might be best to err on the side of including apparently unimportant variables and also interactions between variables (12
), a topic which is beyond the scope of this paper. Age was the only continuous variable in our main study dataset and results were virtually unchanged by including age in categories instead of as a continuous variable in the PS models. The number of exposed individuals in the validation dataset will limit the number of variables that can be used to estimate the PS (and therefore the application of the method) in smaller studies. The confidence intervals of PSC take the uncertainty in the estimation of the error model into account, but not that due to model misspecification or the use of different definitions of some of the covariates, including NSAID use. Confidence intervals should therefore be interpreted only as a rough guide, and a minimum estimate, of the inherent uncertainty (54
We conclude that the propensity score calibration approach, i.e. the combination of propensity scores and regression calibration, can substantially improve control for confounding by unmeasured and imperfectly measured confounders in cohort studies when internal or external validation data are available. The approach might be especially advantageous for pharmacoepidemiologic research based on claims data. Once the advantages and limitations of the approach have been assessed in a variety of settings and parameter constellations, it might help to address one of the main problems in pharmacoepidemiology using large-scale claims databases, i.e. the lack of information on important confounders, and could lead to improvement in the quality of research in this field.