Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Am J Epidemiol. Author manuscript; available in PMC 2006 August 1.
Published in final edited form as:
PMCID: PMC1444885

Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration


Often important confounders are not available in studies. Sensitivity analyses based on the relation of single, but not multiple, unmeasured confounders with an exposure of interest in a separate validation study have been proposed. The authors controlled for measured confounding in the main cohort using propensity scores (PS) and addressed unmeasured confounding by estimating two additional PS in a validation study. The ‘error-prone’ PS exclusively used information available in the main cohort. The ‘gold-standard’ PS additionally included covariates available only in the validation study. Based on these two PS in the validation study, regression calibration was applied to adjust regression coefficients. This propensity score calibration (PSC) adjusts for unmeasured confounding in cohort studies with validation data under certain, usually untestable, assumptions. PSC was used to assess nonsteroidal antiinflammatory drugs (NSAID) and 1-year mortality in a large cohort of elderly. ‘Traditional’ adjustment resulted in a relative risk (RR) in NSAID users of 0.80 (95% confidence interval: 0.77–0.83) compared to an unadjusted RR of 0.68 (0.66–0.71). Application of PSC resulted in a more plausible RR of 1.06 (1.00–1.12). Until validity and limitations of PSC have been assessed in different settings, the method should be seen as a sensitivity analysis.

Keywords: epidemiologic methods, research design, confounding factors (epidemiology), bias (epidemiology), cohort studies, propensity score calibration
Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; NSAID, nonsteroidal antiinflammatory drug; OR, odds ratio; PS, propensity score; PSC, propensity score calibration; RR, relative risk


Unmeasured confounding can be a major source of bias in observational research. Cohort studies often lack measures of important potential confounders, such as smoking and body mass index in pharmacoepidemiologic studies using claims data, or laboratory or blood pressure measurements in questionnaire-based studies. Various methods have been proposed to assess the sensitivity of observed associations to the possible effect of unobserved confounders (16). Recent studies on the side effects of drugs have taken this research one step further by considering external data describing the distributions of confounders in a validation sample and their association with disease based on the medical literature. These can be used to adjust estimates of the association observed in the main study (7,8). All these approaches, however, do not take the joint distribution of multiple unobserved confounders into account. The latter methods also treat the external distributions of confounders as known when in fact they are estimates, usually made from small to moderate-sized samples.

Our objective was to propose a method to adjust for multiple unmeasured confounders in a cohort study. This method incorporates information from an external validation sample to calibrate the propensity score (PS) used to adjust for confounding. After a brief introduction into the background of PS and regression calibration (RC), we develop the rationale for our proposed Propensity Score Calibration (PSC) approach and illustrate its application in a cohort study on the association between nonsteroidal antiinflammatory drugs (NSAID) and 1-year mortality in the elderly.


Propensity Scores

PS is defined as the conditional probability of exposure to a drug or other potential risk factor given observed covariates (9). Each subject has a vector of observed covariates, X, and an indicator of exposure (or treatment), E=1 if exposed and E=0 if unexposed. The PS, e(X), is the probability of exposure for a person with covariates X, i.e.


This PS is usually estimated from the data at hand using multivariable logistic regression. Individuals with the same estimated PS are then thought to have the same chance of being exposed, although they may have very different X. As a group, however, treated and untreated subjects paired on the same PS will have similar distributions of X (10).

Once a PS is estimated, it can be used in different ways to control for confounding in cohort studies. Applications include matching on the estimated PS, multivariate adjustment by sub-classification on the estimated PS, or modeling of the estimated PS-outcome association, and combinations of these methods with ‘traditional’ multivariable outcome modeling (11). In the current context, we will consider modeling of the estimated PS-outcome association in a Cox proportional hazards model


This model is used to allow the application of the proposed PSC, although we do not suggest that such a model would be preferable to a model including all covariates or other ways to use the estimated PS. In theory, conditioning on the estimated PS should lead to exchangeability of exposed and unexposed subjects and thus yield unconfounded treatment estimates in any relative risk or hazard model, as long as there is no unmeasured confounding. Use of the estimated PS in a Cox proportional hazards model has been described by D’Agostino (12) and previously applied by Muller et al. (13).

Regression Calibration

In the context of generally sparse use of methods to correct effect estimates for measurement error in epidemiology, RC is the most widely used approach (1417). It estimates a linear measurement error model of the true or gold-standard variable for one measured with error in a validation study, and uses the resulting regression parameters ( λ^) to correct the ‘naive’ regression estimates ( β^) obtained from error-prone covariate data in the main study.

Propensity Score Calibration

A PS estimated when additional confounders are unobserved can be viewed as a variable measured with error, either due to the lack of information on important predictors of the exposure (unmeasured variables) or due to imperfectly measured predictors.

A validation study usually includes a ‘gold-standard’ measure of the variable measured with error in the main study, in addition to the error-prone measurement (already available in the main study). Since the PS is estimated without information concerning outcomes, a cross-sectional validation study is sufficient to assess the error in the main study PS. This can be achieved by directly comparing a PS containing the same information as the main study PS (i.e. error-prone PS) with the validation study PS that includes additional important determinants of the exposure of interest that are unobserved in the main study. The latter estimated PS can be seen as the ‘gold-standard’ PS whereas the former is the error-prone PS identical in its determinants to the error-prone PS in the main study.

Once the error in the main study PS is estimated in the validation study, regression calibration can be used to adjust the main study PS for that error. This will also adjust estimates of the effect of the exposure of interest on disease risk for bias due to residual confounding that results from the mismeasured PS.

Following the general notation introduced for the PS, we would define


as the error-prone variable and


as the gold-standard, with XGS generally being an expanded set of covariates including XEP. The measurement error model then is


If only one covariate is present, the estimate adjusted for measurement error is very easily obtained by dividing the ‘naive’ regression estimate in the main study by the regression parameter from the measurement error model in the validation sample ( β^/λ^). For two independent variables in the Cox proportional hazards outcome model, one measured with error (the PS) and one without (the exposure), the target model would be


if information on all covariates were available in the main cohort. The RC (15) adjusted estimator for the effect of E then is


and the adjusted estimator for e(X), the effect of the true PS, is


Adjusted estimates for the variances can then be calculated, accounting for the additional uncertainty caused by the estimation of λ in the validation study (15). The method has been developed for cohort studies and produces approximately consistent estimates if the measurement error variance is not too large. In case of an ‘alloyed’ gold-standard, non-differential error is assumed. The resulting estimates are still consistent if the ‘alloyed’ gold-standard is unbiased for the true gold-standard (18). RC leads to approximate consistency in Cox proportional hazards models and can therefore be used to remove most of the bias due to measurement error (16,19). The method is easily applicable using a SAS macro that can be obtained from Dr. Donna Spiegelman, Harvard School of Public Health (ude.dravrah.gninnahc@sldts).

Methods and Results

Study Populations

Main study population

To test this approach, we identified a main study population assembled for an analysis of pain medication use in the elderly. It consists of all community dwelling New Jersey residents who were 65 years or older, filled prescriptions within Medicaid or the Pharmaceutical Assistance to the Aged and Disabled (PAAD) program, and who were hospitalized any time between January 1, 1995 and December 31, 1997. Eligible individuals were those who filled a prescription for any drug within 120 days before hospitalization and another prescription > 365 days before hospitalization, since covariates were assessed during that time period.

For all subjects we extracted the following variables: age, sex, race, all prescriptions filled within 120 days before the index date, all diagnoses assigned within 365 days before the index date, number of hospitalizations, and number of physician visits.

The time until death or 365 days of follow-up (whichever came first) was assessed starting from the date of hospital admission, based on linkage to Medicare files (20).

Validation study

The Medicare Current Beneficiary Survey (MCBS) is conducted in a sample of beneficiaries selected each year to be representative of the current Medicare population, including both aged and disabled beneficiaries living in the community or in institutions. Data, including medication use over the last 4 months verified by inspection of medication containers, are obtained from face-to-face interviews and linked to Medicare claims data. The survey has a high response rate (between 85% and 95%) and very high data completeness (2123).

The MCBS data used for the validation study in this analysis were drawn from a list of all persons enrolled in Medicare on January 1, 1999. As in our main study, the validation study population was restricted to persons aged 65 years or over living in the community (10,446). To make the validation study population more comparable to the main study, we randomly selected MCBS individuals so that their age (3 categories) and sex distribution matched the one observed in the main cohort (frequency matching). This resulted in 5,108 MCBS subjects used for all subsequent analyses.

Table 1 describes the main study population of 103,133 as well as the validation study of 5,108. Due to matching, the age and sex distributions are very similar. The main study population is more diverse with respect to race than the validation study. The main differences are observed with respect to comorbidity, which is much more prevalent in the main study compared to the validation study. Accordingly, individuals in the main study population also have more physician visits and hospitalizations. NSAID use is more prevalent in the main study (17.8%) than in the validation study (12.1%).

Table 1
Description of the main study population and the external validation study

Propensity of NSAID Use

Main study

We first estimated the propensity of NSAID use during the last four months (yes/no) in the main study (claims dataset) using logistic regression. Since over 18,000 individuals used NSAID, no variable selection was conducted; instead, all covariates presented in table 1 were entered into the model.

Validation study

We then estimated two different PS in the validation study, using similar logistic regression models: the first model used identical covariates as in the main study, and a second model used these covariates plus additional information available only in the validation study. The first model corresponds to the error-prone PS model that lacks important confounders, and the second to the ‘gold-standard’ PS. For the ‘gold-standard’ PS, we addressed unmeasured confounding and imperfectly measured confounding in two ways. To address unmeasured confounding, the same variables as in the error-prone PS were used in combination with variables not available in the main study (i.e. smoking, body mass index, activities of daily living, education, and income). To additionally address residual confounding due to imperfectly measured confounders (24), we used self-report (‘Have you ever been told that you had …’) of lifetime rheumatoid arthritis or osteoarthritis in addition to the claims data diagnostic codes for arthritis. We hypothesized that self-report would be more predictive of NSAID use than claims data alone. The predictive value of each PS model was estimated using the area under the receiver operating characteristic curve (AUC) (25).

Determinants of NSAID use

The results from these four different PS models are presented in table 2. On the left, we present the logistic regression results from the main study using claims data only. The three columns on the right describe the results of different models for the propensity of NSAID use in the validation study. The PS model, called ‘error-prone’, follows the PS model of the main study, using the same variables. The PS model, called ‘unmeasured’, uses the same variables along with added variables not available in the main dataset. Finally, the PS model, called ‘un- and imperfectly measured’, contains the same variables as the PS model, called ‘unmeasured’, plus self-report of arthritis in addition to the claims data diagnoses of arthritis.

Table 2
Propensity of NSAID use in the main study population and the external validation study (error-prone and ‘gold-standard’)

Predictive value for exposure

The AUC of the PS in the validation study using claims data only (0.60) is similar to the one from the main study (0.63), indicating only a modest capacity to predict NSAID use. Adding information on body mass index and performance on activities of daily living (as well as education, income, and smoking status that are less pronounced predictors of NSAID use) increases the predictive value to 0.66. Adding self-report of arthritis, which is a strong predictor of NSAID use (OR = 4.1; 95% CI: 3.1 – 5.5) independent of the claims-data diagnosis of arthritis, further increases the predictive value of the PS model (AUC = 0.71).

Unadjusted association between NSAID use and mortality

During the follow-up period of 1 year, 21,928 (21.3%) hospitalized patients died, either during hospitalization or afterwards. Without any control for confounding, NSAID use appears to be associated with a 32% (95% CI: 29 – 36%) mortality risk reduction (table 3). There is no known biological reason to expect that NSAID use would cause a reduction in the risk of death. Instead, the observed association is likely to be due to selection bias, i.e. the fact that physicians are more likely to treat symptomatic pain with narcotic agents rather than NSAID in patients who are moribund.

Table 3
Association between NSAID use and 1-year mortality in a population-based cohort of 103,133 elderly - propensity score calibration adjustment based on data from 5,108 participants of the Medicare Current Beneficiary Survey as external cross-sectional validation ...

Control for observed confounding

Table 3 also describes the association between NSAID use and 1-year mortality from Cox proportional hazards models using various approaches to control for observed confounding. Controlling for age and gender in the ‘traditional’ outcome model, we observe a risk reduction somewhat closer to the null (26%; 95% CI: 23 – 29%) compared to the unadjusted result. Controlling for all variables presented in table 1 in the outcome model, we observe a risk reduction of 20% (95% CI: 17 – 23%). Essentially the same amount of relative risk reduction (19%; 95% CI: 16 – 22%) is observed when applying PS methods (i.e. modeling the incidence of disease as a function of the estimated probability of exposure as a continuous variable and unique confounder together with the exposure).

Propensity Score Calibration


We implemented the PSC approach by using RC to correct for measurement error in the PS of the main study. The SAS macro ‘%blinplus’ uses the validation study containing data on the two estimated PS (error-prone and ‘gold-standard’) as well as the parameter estimates, their standard errors and covariances from the Cox proportional hazards model, using the error-prone PS in the main study (see appendix 1). The macro output provides the adjusted relative hazard rate estimates, including 95% confidence intervals that are adjusted for additional uncertainty from the estimation of the error model in the validation study. We present the corresponding programming steps and a sample output in appendix 1.

Application using MCBS data as external validation study

Using the MCBS as an external validation study, the application of the PSC approach results in estimates for the association between NSAID use and 1-year mortality closer to the null and even slightly beyond the null (table 3). When using the ‘unmeasured’ confounding model including the claims data variables plus interview data as the gold-standard, the relative risk reduction with NSAID diminishes to just 8% (95% CI: 4 – 12%). Finally, when we use the self-report of arthritis in addition to the claims data diagnostic code in the ‘gold-standard’ model to address confounding due to unmeasured and imperfectly measured covariates, NSAID use is associated with a 6% (95% CI: 0 to 11%) increased mortality risk.

Simulation study

To assess the performance of PSC, we conducted a simulation study. Although simulating a confounder requires information on the outcome, PSC simulation results reported ignore this hypothetical outcome and therefore apply to the external validation study design. Starting from the observed cohort, we simulated 1,000 studies of 103,133 participants each, adding a single dichotomous confounder that is inversely associated with the exposure and a risk factor for mortality. For this simulated confounder, we assumed the following expected prevalences: unexposed and alive: 0.5, unexposed and dead: 0.75, exposed and alive: 0.25, and exposed and dead: 0.5. These parameters resulted in an overall prevalence of the confounder of 51%, an exposure-confounder odds ratio of 0.33, and a hazard ratio for mortality (independent of the exposure) of 2.7. In each study, we then randomly sampled participants into an internal validation study with a selection probability of 5%. We controlled for confounding, including the simulated confounder, by adjusting for the estimated gold-standard PS in the Cox proportional hazards model to obtain an estimate of the unobservable truth. Then we applied PSC using only the 5% internal validation study to estimate the error in the error-prone PS and to apply RC to the whole cohort estimates based on the error-prone PS. The median estimate using PSC (RR = 1.04, see table 4) is virtually identical to the median estimate from the unobservable truth (RR = 1.03). The increased width of the empirical 95% confidence interval of the PSC reflects the additional uncertainty introduced by the 5% validation study. The asymmetry towards higher values indicates a slight tendency towards over-adjustment.

Table 4
Simulation study of propensity score calibration: association between NSAID use and 1-year mortality in a population-based cohort of 103,133 elderly with an additional simulated dichotomous risk factor for mortality (RR = 2.7) inversely associated with ...


We have combined for the first time two existing epidemiologic methods, propensity scores and regression calibration, to control for unmeasured confounding in cohort studies using validation data. By taking the joint distribution of unmeasured confounders into account, the PSC extends prior work using sensitivity analyses and adjustments based on single confounders (18). In our example, this novel approach appeared effective in controlling for unmeasured confounding by virtually eliminating the presumably spurious ‘protective’ effect of NSAID on mortality. In addition, the amount of residual confounding due to unmeasured and poorly measured covariates was important enough to qualitatively change the likely spurious association between NSAID use and all-cause mortality. The proposed strategy is easy to conceptualize and implement using user-friendly, well-documented, and available software.

PSC needs to be compared to other methods to address unobserved confounding in non-experimental research (18). Others have proposed sensitivity analyses addressing the effect of several confounders by using external information on the relation of unobserved confounders with the exposure and disease of interest (8). To our knowledge, none of these methods can sufficiently address the joint effect of several unobserved confounders. Another important advantage of PSC is that it is not dependent on outcome information in the validation study, i.e. it can be applied using cross-sectional external validation studies without having to specify the confounder-disease association. In this setting, imputation of gold-standard PS values in the main study from the error-prone PS, exposure and the regression parameters from the error-model in the validation study would essentially be a form of RC. It has recently been shown that these are algebraically identical (26), but estimation of adjusted standard errors taking the uncertainty in the error-regression model into account would need to rely on bootstrapping or derivation and programming of asymptotic variances (14,27).

In the setting of an internal validation study, i.e. when information on an extended set of covariates and the outcome is available for a subsample of the main study, unobserved confounding in the main study can be seen as a missing covariate problem. In this context, methods to address missing data, including multiple imputation (28) and Bayesian methods (29,30) may be applied.

We present a single example to illustrate the applicability of the analytic strategy and to encourage further research into its validity and limitations in different settings. Many of the issues regarding measurement error and validation samples have been addressed previously, and are likely to apply to PSC as well. Accordingly, the proposed method is approximately unbiased with the amount of residual bias depending on established theoretical and analytical properties of propensity scores and regression calibration. The combination of these methods could, however, lead to analysis of parameter constellations not previously assessed (17,18,31). Validation studies often are not a perfect random sample of the main study population, but as long as the sampling is representative for the error mechanism, valid estimation and inference will follow using RC even if the marginal distributions of variables in the main study do not match those of the validation study (32). RC approximations break down when the measurement error is large, i.e. when the correlation between the estimated error-prone and ‘gold-standard’ PS is weak. In our example, the correlations with the error-prone PS were moderate (0.56 – 0.68). The sizes of the main and validation studies will have a big impact on the precision of the resulting estimates (33,34), and both were very large in our example. Because the validation study was very large in our example, there was only a minor increase in the width of the confidence interval for the exposure of interest using PSC.

Our approach is easy to implement (see appendix 1). PS are increasingly used in medical research (35) although many issues concerning their optimal implementation are still unresolved. RC is the most widely applied approach to address measurement error in multivariate epidemiologic analyses. Its advantages include the ability to adjust for confounding in the primary regression by continuous, discrete, and ordinal variables, which are assumed to be perfectly measured, as well as the availability of an easy to implement (see appendix 1) macro providing adjusted effect estimates and standard errors for a variety of multivariate models.

We chose the example of NSAID and all-cause mortality, since NSAID use is unlikely to lead to a reduced mortality in this elderly population (36). Glynn et al. argued that in an elderly population selected drug classes, including NSAID, are more likely to be prescribed to healthier subjects less close to death (36). NSAID use in the elderly is associated with several adverse outcomes, including increased risk of gastrointestinal hemorrhage (3740), impaired kidney function (4143), and hypertension (44,45). Therefore, no association with mortality or a slightly increased risk of mortality seems biologically more plausible than any amount of reduced risk.

The design of validation studies has been extensively addressed (4650) and these issues are likely to apply to PSC as well. It is important to note that medication use was assessed by prescription claims data in the main study and by interview in the validation study. Ideally, we should have medication use assessed by prescription data in the validation study as well in order to exactly reproduce the error-prone PS in the validation study. Depending on the relative quality of these sources of information on medication use (51), the error in the error-prone PS might have been under- or overestimated somewhat in our example. The observed differences in the prevalence of medication use per se do not invalidate the approach (34).

We used the PS as a continuous variable in our disease and error models, which is dependent on the assumption of a log-linear association with the dependent variable. This assumption can be easily tested (for example by inclusion of categories or quadratic and cubic terms) in the measurement error model (equation 5) and was essentially met in our study. Like regression calibration, PSC is also dependent on the assumption that the error-prone variable is a surrogate for the gold-standard variable, i.e. that the error-prone propensity score is independent of disease given the gold-standard propensity score and exposure (27,52, see appendix 2). This strong assumption cannot be tested without information on the outcome, e.g. in an external validation study. If this assumption is violated, PSC can be far less useful and even counter-productive, in that it can increase rather than decrease bias. If information on the outcome is available in the validation study (e.g. in an internal validation study), one can check the independence of the error-prone propensity score and disease given the gold-standard propensity score and exposure. If this is the case, the assumption may be considered valid in the context of the main study, too.

The assumption that the gold-standard propensity score has a linear association with the logit of disease given exposure (9) cannot be tested for the target disease model (equation 6). Since the gold-standard propensity score captures all of the confounding in a single covariate, misspecification of its association with the outcome is likely to have pronounced effects on its ability to control for confounding (53).

We did not assess the role of variable selection procedures when estimating the PS. With large datasets it might be best to err on the side of including apparently unimportant variables and also interactions between variables (12), a topic which is beyond the scope of this paper. Age was the only continuous variable in our main study dataset and results were virtually unchanged by including age in categories instead of as a continuous variable in the PS models. The number of exposed individuals in the validation dataset will limit the number of variables that can be used to estimate the PS (and therefore the application of the method) in smaller studies. The confidence intervals of PSC take the uncertainty in the estimation of the error model into account, but not that due to model misspecification or the use of different definitions of some of the covariates, including NSAID use. Confidence intervals should therefore be interpreted only as a rough guide, and a minimum estimate, of the inherent uncertainty (54).

We conclude that the propensity score calibration approach, i.e. the combination of propensity scores and regression calibration, can substantially improve control for confounding by unmeasured and imperfectly measured confounders in cohort studies when internal or external validation data are available. The approach might be especially advantageous for pharmacoepidemiologic research based on claims data. Once the advantages and limitations of the approach have been assessed in a variety of settings and parameter constellations, it might help to address one of the main problems in pharmacoepidemiology using large-scale claims databases, i.e. the lack of information on important confounders, and could lead to improvement in the quality of research in this field.


The authors thank Dr. Kenneth J Rothman for helpful discussions and his valuable suggestions.


For notation, see methods: e=exposure of interest (e = 1(0) if the exposure of interest is present (absent)), x=error-prone propensity score, xgs=‘gold-standard’ propensity score.

Program Code

proc logistic data=main descending;* PS in main study *;

model e = age sex black other etc.;

output out=main predicted=x;

proc phreg data=main covout outest=bvarb;

model pdays*death(0) = x e;* outcome model adjusting for error-prone PS *;

proc logistic data=val descending;* PS in validation study (error-prone) *;

model e = age sex black other etc.;

output out=val2 predicted=x;

proc logistic data=val2 descending;* PS in validation study (gold-standard) *;

model e = age sex black other etc.

bmi edu inc csmk psmk adlsdiff adlunable;* covariates not observed in main study *;

output out=val3 predicted=xgs;

%include '……\';

data wts;* weights needed for calibration *;

input name $ weight;


x 1.0

e 1.0



type = PHREG, /* type can be LOGISTIC, REG, or PHREG */

valid = val3, /* Dataset with Validation Study data */

main_est = bvarb, /* Dataset containing est.s from main study regress. */

/* on the var.s given in err_var (below). */

weights = wts, /* Dataset with weights */

err_var = x e, /* List of the variables measured with error */

true_var = xgs e, /* List of the variables measured without error */

depend = pdays, /* Name of the dependent variable */

change = , /* Percent Change Scale: T/F (need type=REG for T) */

internal = f, /* If validation study is Internal, do you wish to */

/* combine estimates? T/F (if T, need to provide */

/* specify val_est argument*/

val_est = /* Dataset containing Internal validation Est.s */


Program Output

Main study regression coefficients: Uncorrected
x1.00−6.011390.123200.002450.00192 – 0.003120.00000
e1.00−0.214330.020100.807080.77590 – 0.839520.00000
Main study regression coefficients: Corrected
x1.00−6.278720.190110.001880.00129 – 0.002720.00000
e1.000.054160.029231.055660.99687 – 1.117910.06390


Proof of PSC in the case of a normally distributed outcome Y according to Carroll, Ruppert and Stefanski (27). Carroll and Stefanski also provide extensions to generalized linear models that are not presented here for the sake of clarity (52).

Suppose we have measures of the continuous outcome Y, the dichotomous exposure of interest E, and a subset of covariates XEP available for the entire population, and the entire set of covariates XGS available for a subgroup. Our target model would be


where (XGS) is the gold-standard propensity score.

According to the property of conditional expectation, the expected value of Y given the error-prone propensity score e(XEP) can be written as


Under the assumption that e(XEP) is independent of the outcome given e(XGS) and E (i.e. is a surrogate of e(XGS))


which leads to


Thus regression calibration, i.e. the single imputation E{e(XGS)| E,e(XEP)} to control for confounding of the exposure disease association in the outcome model, is valid under the assumptions that disease is a linear function of XGS given E and that XEP is a surrogate for XGS (i.e. is independent of Y given XGS and E).


1. Cornfield J, Haenszel W, Hammond EC, et al. Smoking and lung cancer: Recent evidence and a discussion of some questions. J Natl Cancer Inst. 1959;22:173–203. [PubMed]
2. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Statist Soc B. 1983;45:212–8.
3. Rosenbaum PR. Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika. 1987;74:13–26.
4. Rosenbaum PR. Sensitivity analysis for matched case-control studies. Biometrics. 1991;47:87–100. [PubMed]
5. Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948–63. [PubMed]
6. Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: Concepts and analytical approaches. Annu Rev Public Health. 2000;21:121–45. [PubMed]
7. Velentgas P, Cali C, Diedrick G, et al. A survey of aspirin use, non-prescription NSAID use, and cigarette smoking among users and non-users of prescription NSAID: estimates of the effect of unmeasured confounding by these factors on studies of NSAID use and risk of myocardial infarction. Pharmacoepidemiol Drug Saf. 2001;10 (Suppl 1):S103.
8. Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH: Adjusting for unmeasured confounders in pharmacoepidemiologic claims data using external information: The example of COX2 inhibitions and myocardial infarction. Epidemiology (in press). [PubMed]
9. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
10. Rubin DB, Thomas N. Matching using estimated propensity scores: relating theory to practice. Biometrics. 1996;52:249–64. [PubMed]
11. Joffe MM, Rosenbaum PR. Invited commentary: propensity scores. Am J Epidemiol. 1999;150:327–333. [PubMed]
12. D’Agostino RB., Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17:2265–81. [PubMed]
13. Muller JE, Turi ZG, Stone PH, et al. Digoxin therapy and mortality after myocardial infarction. Experience in the MILIS Study. N Engl J Med. 1986;314:265–71. [PubMed]
14. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med. 1989;8:1051–69. [PubMed]
15. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: The case of multiple covariates measured with error. Am J Epidemiol. 1990;132:734–45. [PubMed]
16. Spiegelman D, McDermott A, Rosner B. Regression calibration method for correcting measurement-error bias in nutritional epidemiology. Am J Clin Nutr. 1997;65(Suppl):1179S–86. [PubMed]
17. Fraser GE, Stram DO. Regression calibration in studies with correlated variables measured with error. Am J Epidemiol. 2001;154:836–844. [PubMed]
18. Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an alloyed gold standard. Am J Epidemiol. 1997;145:184–96. [PubMed]
19. Hu P, Tsiatis AA, Davidian M. Estimating the parameters in the Cox model when covariate variables are measured with error. Biometrics. 1998;54:1407–19. [PubMed]
20. Yuan Z, Cooper GS, Einstadter D, Cebul RD, Rimm AA. The association between hospital type and mortality and length of stay. Med Care. 2000;38:231–45. [PubMed]
21. Adler GS. A profile of the Medicare Current Beneficiary Survey. Health Care Financing Review. 1994;15:153–63. [PMC free article] [PubMed]
22. Adler GS. Medicare beneficiaries rate their medical care. Health Care Financing Review. 1995;16:175–87. [PMC free article] [PubMed]
23. Eppig FJ, Chulis GS. Matching MCBS and Medicare data: The best of both worlds. Health Care Financing Review. 1997;18:211–29. [PMC free article] [PubMed]
24. Kupper LL. Effects of the use of unreliable surrogate variables on the validity of epidemiologic research studies. Am J Epidemiol. 1984;120:643–8. [PubMed]
25. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models. Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87. [PubMed]
26. Thurston SW, Spiegelman D, Ruppert D. Equivalence of regression calibration methods for main study/external validation study designs. J Stat Plann Inf. 2003;113:527–39.
27. Carroll RJ, Ruppert D, Stefanski LA. Measurement error in nonlinear models. Chapman/Hall; London 1995.
28. Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some applications. Stat Med. 1991;10:585–98. [PubMed]
29. Schmid CH, Rosner B. A Bayesian approach to logistic regression models having measurement error following a mixture distribution. Stat Med. 1993;12:1141–53. [PubMed]
30. Gustafson P. Measurement error and misclassification in statistics and epidemiology – impacts and Bayesian adjustments. Boca Raton, FL: Chapman & Hall/CRC Interdisciplinary Statistics Series; 2004.
31. Stürmer T, Thürigen D, Spiegelman D, et al. The performance of measurement error correction methods for the analysis of case-control studies with validation data; a simulation study. Epidemiology. 2002;13:507–16. [PubMed]
32. Spiegelman D, Carroll RJ, Kipnis V. Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. Stat Med. 2001;20:139–60. [PubMed]
33. Greenland S. Statistical uncertainty due to misclassification: implications for validation substudies. J Clin Epidemiol. 1988;41:1167–74. [PubMed]
34. Spiegelman D, Gray R. Cost-efficient study designs for binary response data with Gaussian covariate measurement error. Biometrics. 1991;47:851–69. [PubMed]
35. Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles fro modeling propensity scores in medical research: a systematic literature review. Pharmacoepidemiol Drug Saf 2004 (in press online). [PubMed]
36. Glynn RJ, Knight EL, Levin R, et al. Paradoxical relations of drug treatment with mortality in older persons. Epidemiology. 2001;12:682–9. [PubMed]
37. Garcia Rodriguez LA, Hernandez-Diaz S. Relative risk of upper gastrointestinal complications among users of acetaminophen and nonsteroidal anti-inflammatory drugs. Epidemiology. 2001;12:570–6. [PubMed]
38. Hernandez-Diaz S, Garcia Rodriguez LA. Epidemiologic assessment of the safety of conventional nonsteroidal anti-inflammatory drugs. Am J Med. 2001;110 (Suppl 3A):20S–7. [PubMed]
39. Ofman JJ, MacLean CH, Straus WL, et al. A metaanalysis of severe upper gastrointestinal complications of nonsteroidal antiinflammatory drugs. J Rheumatol. 2002;29:804–12. [PubMed]
40. Solomon DH, Glynn RJ, Bohn R, et al. The hidden cost of nonselective nonsteroidal anti-inflammatory drugs in older patients. J Rheumatol. 2003;30:792–8. [PubMed]
41. Gurwitz JH, Avorn J, Ross-Degnan D, et al. Nonsteroidal anti-inflammatory drug-associated azotemia in the very old. J Am Med Assoc. 1990;264:471–5. [PubMed]
42. Field TS, Gurwitz JH, Glynn RJ, et al. The renal effects of nonsteroidal anti-inflammatory drugs in old people: findings from the Established Populations for Epidemiologic Studies of the Elderly. J Am Geriatr Soc. 1999;47:507–11. [PubMed]
43. Stürmer T, Erb A, Keller F, et al. Determinants of impaired renal function with use of nonsteroidal anti-inflammatory drugs: the importance of half-life and other medications. Am J Med. 2001;111:521–7. [PubMed]
44. Gurwitz JH, Avorn J, Bohn RL, Glynn RJ, Monane M, Mogun H. Initiation of antihypertensive treatment during nonsteroidal anti-inflammatory drug therapy. JAMA. 1994;272:781–6. [PubMed]
45. Dedier J, Stampfer MJ, Hankinson SE, Willett WC, Speizer FE, Curhan GC. Nonnarcotic analgesic use and the risk of hypertension in US women. Hypertension. 2002;40:604–8. [PubMed]
46. Willett W. An overview of issues related to the correction of nondifferential exposure measurment error in epidemiologic studies. Stat Med. 1989;8:1041–9. [PubMed]
47. Wacholder S, Weinberg CR. Flexible maximum likelihood methods for assessing joint effects in case-control studies with complex sampling. Biometrics. 1994;50:350–7. [PubMed]
48. Holford TR, Stack C. Study design for epidemiologic studies with measurement error. Stat Method Med Res. 1995;4:339–58. [PubMed]
49. Collet JP, Schaubel D, Hanley J, et al. Controlling confounding when studying large pharmacoepidemiologic databases: a case study of the two-stage sampling design. Epidemiology. 1998;9:309–15. [PubMed]
50. Chatterjee N, Wacholder S. Validation Studies: Bias, Efficiency, and Exposure Assessment. Epidemiology. 2002;13:503–6. [PubMed]
51. Sjahid SI, van der Linden PD, Stricker BH. Agreement between the pharmacy medication history and patient interview for cardiovascular drugs: the Rotterdam elderly study. Br J Clin Pharmacol. 1998;45:591–5. [PMC free article] [PubMed]
52. Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. J Am Stat Assoc. 1990;85:652–63.
53. Rubin DB. The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics. 1973;29:185–203.
54. Greenland S. Randomization, statistics, and causal inference. Epidemiology. 1990;1:421–9. [PubMed]