In the following sections we illustrate how external validation data without information on the disease outcome of interest can be used to adjust estimates of an exposure-outcome association based on published work. This application example has been previously described in detail36
and is only summarized here.
There is no known biological reason to expect that nonsteroidal anti-inflammatory drugs (NSAIDs) would cause a reduction in the risk of death (indeed, there is some evidence for the contrary). Glynn et al. observed that NSAID were associated with a strong reduction in risk for short-term mortality (relative risk = 0.74) in elderly hospitalized patients, however, even after a wide variety of health indicators available in claims data were controlled for.6
This association is likely to be due to selection bias leading to strong unmeasured confounding: Physicians are less likely to prescribe NSAIDs (eg, compared with narcotics) in frail old adults as well as in patients with advanced cancer and a variety of other comorbidities, including renal disease, that are associated with a high mortality. Some of these, however, are measured in claims data and a single unobserved confounder would need to be strongly associated with avoidance of NSAIDs and mortality as well as be prevalent to explain the strong inverse association between NSAIDs and mortality. Although a single confounder explaining the strong inverse association might be implausible, joint confounding by a variety of confounders, each of which is only moderately associated with NSAID use and mortality and not very prevalent, but all acting in the same direction might nevertheless be plausible.
To incorporate information on joint confounding by unmeasured covariates using data from a cross-sectional external validation study, we combined propensity scores (PS)37
and regression calibration38
into propensity score calibration (PSC)36
The PS is defined as the conditional probability of exposure (to a drug) given observed covariates. It is usually estimated from the data at hand using multivariable logistic regression. Individuals with the same estimated PS are then thought to have the same chance of being exposed. As a group, treated and untreated subjects paired on the same PS will have similar distributions of and thus comparisons be unconfounded by observed covariates.37
Regression calibration is a method to correct effect estimates for measurement error.38
In the context of generally sparse use of methods to correct for measurement error in epidemiology, regression calibration is the most widely used approach. It is based on data from a validation study that includes the “error-prone” measure of the variable used in the main study and an additional “gold-standard” measure of the same variable. Within the validation study, one estimates a linear measurement error model with the “true” or “gold-standard” variable as dependent variable and the “error-prone” variable and variables measured without error as independent covariates. Under the main assumption that the “error-prone” variable contains no information on the outcome beyond the “gold-standard” variable (surrogacy),39
regression calibration then uses the regression estimates from this measurement error model to correct the 'naive' regression estimates obtained from error-prone variable in the main study.36
Regression calibration is used mainly to correct associations between continuous exposures (eg, blood pressure, nutrients) and outcomes for measurement error in the exposure of interest.
To apply regression calibration to adjust for the joint confounding of multiple confounders unobserved in the main study, we first combine all confounders into a single score, the PS, and assume that the PS estimated in the main study based on a subset of important confounders is measured with error. This error can be estimated in an external validation study using data on additional confounders. One can then adjust for unmeasured confounding due to that measurement error using regression calibration.
To implement PSC, we first controlled for measured confounding in the main cohort using the “error-prone” PS estimated in the main study. We then estimated 2 additional PSs in the external validation study: the “error-prone” PS based on information available in the main cohort, and the “gold-standard” PS that included covariates available only in the validation study (see ). Based on these 2 PSs in the validation study, we applied regression calibration to correct regression coefficients in the main cohort.36
Concept of propensity score calibration
To test this approach, we identified a main study population assembled for an analysis of pain medication use in elderly patients.40
It comprised all community dwelling New Jersey residents who were aged 65 years or older, filled prescriptions within Medicaid or the Pharmaceutical Assistance to the Aged and Disabled program, and were hospitalized between January 1, 1995 and December 31, 1997. Eligible individuals were those who filled a prescription for any drug within 120 days before hospitalization and another prescription more than 365 days before hospitalization. Covariates were assessed during the 365 days before hospitalization.
For all 103,133 eligible subjects we extracted the following variables: age, sex, race, all prescriptions filled within 120 days before the date of hospital admission, all diagnoses assigned, number of hospitalizations, and number of physician visits within 365 days before that date. The time until death or 365 days of follow-up (whichever came first) was assessed starting from the date of hospital admission, based on linkage to Medicare files.41
External validation study
The Medicare Current Beneficiary Survey (MCBS) is conducted in a sample of beneficiaries selected each year to be representative of the current Medicare population, including both aged and disabled beneficiaries living in the community or in institutions. Data, including medication use over the last 4 months verified by inspection of medication containers, are obtained from face-to-face interviews and linked to Medicare claims data. The survey has a high response rate (between 85% and 95%) and very high data completeness.42
The MCBS data used for the validation study in this analysis were drawn from a list of all persons enrolled in Medicare on January 1, 1999. As in our main study, the validation study population was restricted to persons aged 65 years or older living in the community (10,446 persons). To make the validation study population more comparable with the main study, we randomly selected MCBS individuals according to the age (3 categories) and sex distribution in the main cohort (frequency matching). This resulted in 5,108 MCBS subjects used for all subsequent analyses.
Control for observed confounding
During a follow-up period of 1 year, 21,928 (21.3%) of the main study population died during or after hospitalization. Without any control for confounding, NSAID use appeared to be associated with a 32% (95% CI: 29 – 36%) mortality risk reduction (). This observed association is likely to be due to selection bias (ie, the fact that physicians are less likely to treat pain with NSAIDs in frail old adults). Controlling for just age and gender in the conventional outcome model, we observed a smaller estimate of decreased risk (26%; 95% CI: 23 – 29%) compared with the unadjusted result. Controlling for a wide variety of health indicators available in claims data (for a list, see footnote) in the outcome model, we observed a risk reduction of 20% (95% CI: 17 – 23%). We observed essentially the same amount of risk reduction (19%; 95% CI: 16 – 22%) when we controlled for confounding using a PS that was estimated based on the same claims data covariates as were used in the outcome model and modeling mortality as a function of the estimated PS together with the exposure.
Table 3 Association between non-steroidal anti-inflammatory drug use and 1-year mortality in a population-based cohort of 103,133 elderly - propensity score calibration adjustment based on data from 5,108 participants of the Medicare Current Beneficiary Survey (more ...)
Control for unobserved confounding with Propensity Score Calibration
We implemented the PSC approach by using regression calibration to correct for measurement error in the PS of the main study. Regression calibration was based on the estimation of the 2 PSs (“error-prone” and “gold-standard”) in the external validation study (). Better prediction of exposure by inclusion of the survey information in the “gold-standard” PS resulted in an increased c-statistic or area under the receiver operating characteristic curve (AUC) of 0.66, compared with 0.60 in the “error-prone” PS. None of the additional variables entered in the “gold-standard” PS is strongly related to exposure to NSAIDs. Nevertheless, the prediction of NSAIDs exposure is substantially improved as quantified by the c-statistic, giving plausibility to our hypothesis that the strong unmeasured confounding in the main study is not due to a single dominant confounder but rather due to the joint effect or combination of multiple modest confounders. In general, however, the c-statistic of the PS has only limited value when assessing its performance to control for confounding.43
Propensity of non-steroidal anti-inflammatory drug use in the main study population and the external validation study (modified from reference 29)
We implemented regression calibration using a linear measurement error model. The “gold-standard” PS that predicted NSAID exposure based on claims data combined with interview data was the dependent variable; the “error-prone” PS that predicted NSAID exposure based on claims data only was one independent variable, and NSAID exposure served as the second independent variable. As noted above, the analysis of the exposure-outcome association controlling for the “error-prone” PS in the main study indicated a 19% risk reduction. When we adjusted this estimate for the measurement error in the “error-prone” PS (as estimated by comparison with the “gold-standard” PS in the validation study), we found NSAID use to be associated with a very small 6% (95% CI: 0 to 11%) increased mortality risk after PSC (). Thus, adding information on additional confounders using PSC resulted in a more plausible estimate for the association between NSAID use and all-cause short-term mortality in the elderly.36
Like regression calibration, PSC is dependent on the surrogacy assumption that the “error-prone” PS does not contain any information on the disease outcome given the “gold-standard” PS and exposure.39,44
In simulations over a wide range of parameters we found that PSC is valid if surrogacy holds, but that it can increase rather than decreases bias in situations where surrogacy is violated.45
Surrogacy holds when the directions of confounding by the measured and the unmeasured covariates are the same. In the NSAIDs example, it is plausible that the underlying frailty leading both to a lower propensity for NSAID use and to higher mortality is only partly captured in the main study and is better captured when additional information from the validation study (eg, data on activities of daily living) is added. Thus, surrogacy might be assumed in this setting. The “gold-standard” PS also performs as an approximate instrument under assumptions similar to surrogacy because it hopefully better approaches the true, but unknown, propensity of treatment than the “error-prone” one.39,46